I am currently programming with c++ a program that handles both alphabets and Korean characters.
However I learned that the size of char in c++ is only 1 bytes. This meant that in order to handle foreign characters or UNICODE, it needs to use two chars for one character.
string s = string("a가b나c다"); cout<< s.length(); prints 9
but my question is how does the c++ execution distinguish between the two different type of characters?
for example if I make a char array the size of 9, How does it know whether its 9 ascii characters or 4 unicode + 1 ascii ??
and then i figured out this :
char c; int a; char* cp = "가나다라마바사아"; for (int i = 0; i < 20; i++) { c = a = cp[i]; cout << "\n c val : " << c; cout << "\n a val : " << a; } ONLY prints out negative values for a.
c val : a val : -80 c val : a val : -95 c val : a val : -77 c val : a val : -86 c val : a val : -76 c val : a val : -39 Which i can infer that for non ascii characters it only uses negative values? but isn't this quite a waste ?
My question in summary: Do c++ distinguish ascii chars and unicode chars only by looking if they are negative ?
Answer in summary : the parser decides whether to consider 1~4 char as 1 glyph by looking up the first few bits of the char, so to some extent my assumption was valid.