6

Or do these two characters simply not exist in Shift_JIS?

The first 128 characters in the Shift_JIS character encoding scheme match ASCII except for two: 0x5C is a Yen symbol (¥) instead of a backslash, and 0x7E is an overline () instead of a tilde.

While there's plenty of clear information about how ¥ and takeover for \ and ~, I haven't been able to find any clear statement about whether \ and ~ simply don't exist in Shift_JIS, or if there are alternate (probably multi-byte) encodings to handle these two displaced ASCII characters.

When I try to encode \ or ~ using node-iconv, it throws an error.

iconv-lite encodes both ¥ and \ as 0x5C, and both and ~ as 0x7E. When decoding, iconv-lite currently (and unfortunately) decodes 0x5C as \ and 0x7E as ~, pending response to a bug report:

1 Answer 1

5

Character set of Shift_JIS is defined in JIS (Japanese Industrial Standard).

Character encoding Shift_JIS uses JIS X 0201 for half-width character set, and JIS X 0208 for full-width character set.

\ and ~ in the question mean the half-width backslash and tilde in ISO/IEC 8859-1(Latin-1), right? JIS X 0201 (half-width character set) doesn't contain these characters (see https://en.wikipedia.org/wiki/JIS_X_0201).

So the answer is, both of \ and ~ don't exist in Shift_JIS.

FYR, JIS X 0208 contains full-width backslash (FULLWIDTH REVERSE SOLIDUS, U+FF3C in Unicode). JIS X 0208 doesn't contain full-width tilde, but Shift_JIS equivalent in Windows (Microsoft Codepage 932) contains full-width tilde (FULLWIDTH TILDE, U+FF5E in Unicode).

Sign up to request clarification or add additional context in comments.

2 Comments

It's very odd that two characters so commonly used in coding would be totally left out! Displaced I can see, but totally missing is surprising. I guess these days UTF-8 and other encodings have pretty much taken over, however, so this is just a legacy issue at this point.
It's not so odd. All it means is that Shift-JIS is not a suitable encoding for writing program source code. That doesn't make Shift-JIS any less suitable for encoding text documents. There is nothing preventing you from writing a program whose source code is in UTF-8 or ASCII, which processes text documents using Shift-JIS. An analogy is that you can write a C compiler using FORTRAN source code. Or write laws for France in the English language.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.