Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

8
  • 17
    @dave: I don't know what headache does UTF-8 create which is greater than that of Widechars (UTF-16). in UTF-16, you also have multi-character characters. Commented Dec 29, 2009 at 16:08
  • The problem is that if you're anywhere but English speaking country you OUGHT to use wchar_t. Not to mention that some alphabets have way more characters than you can fit into a byte. We were there, on DOS. Codepage schizophrenia, no, thanks, no more.. Commented Nov 26, 2016 at 23:02
  • 1
    @Swift The problem with wchar_t is that its size and meaning are OS-specific. It just swaps the old problems with new ones. Whereas a char is a char regardless of OS (on similar platforms, at least). So we might as well just use UTF-8, pack everything into sequences of chars, and lament how C++ leaves us completely on our own without any standard methods for measuring, indexing, finding etc within such sequences. Commented May 21, 2017 at 14:16
  • 1
    @Swift You seem to have it completely backwards. wchar_t is a fixed-width data type, so an array of 10 wchar_t will always occupy sizeof(wchar_t) * 10 platform bytes. And UTF-16 is a variable-width encoding in which characters may be made up of 1 or 2 16-bit codepoints (and s/16/8/g for UTF-8). Commented May 21, 2017 at 14:42
  • 1
    @SteveHollasch wchar_t representation of string on windows would encode characters greater than FFFF as aspecial surrogate pair, other would take only one wchar_t element. So that representation will not be compatible with representation created by gnu compiler (where all characters less than FFFF will have zero word in front of them). What is stored in wchar_t is determined by programmer and compiler, not by some agreement Commented Nov 5, 2017 at 0:33