5
$\begingroup$

For what characters c does CharacterCode[c] not return the same value as c's Unicode code point? How does one convert c to its Unicode code point with Mathematica in general?

$\endgroup$

1 Answer 1

8
$\begingroup$

You can see here.

For a character in range U+0000 to U+D7FF or U+E000 to U+FFFF, ToCharacterCode[c] will just return the same value as c's Unicode code point.

For a character in range U+10000 to U+10FFFF, ToCharacterCode[c] will return two numbers, and Mathematica will take it as two characters.

For example:

In[1]:= ToCharacterCode /@ {"$", "€", "𐐷", "𤭢"} Out[1]= {{36}, {8364}, {55297, 56375}, {55378, 57186}} In[2]:= StringLength@"𐐷" Out[2]= 2 

In fact, the Unicode code point of "𐐷" is U+10437, which is 66615 in decimal. And {55297, 56375} is just IntegerDigits[66615 - 65536, 1024] + {55296, 56320}.

The following function can convert a Unicode code point to the corresponding Mathematica CharacterCode.

If[# < 65536, {#}, IntegerDigits[# - 65536, 1024] + {55296, 56320}] & 
$\endgroup$
7
  • $\begingroup$ What about the characters between U+D7FF and U+E000? $\endgroup$ Commented Mar 19, 2015 at 2:07
  • $\begingroup$ @qazwsx They are not assigned to characters. $\endgroup$ Commented Mar 19, 2015 at 2:21
  • $\begingroup$ 55296 is 0xD800. What's 56320? $\endgroup$ Commented Mar 19, 2015 at 3:57
  • $\begingroup$ @qazwsx 0xDC00. $\endgroup$ Commented Mar 19, 2015 at 5:50
  • 2
    $\begingroup$ Uh... this question ask for char -> Unicode codepoint, not vice versa. $\endgroup$ Commented Jul 4, 2018 at 16:27

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.