12

To find out if C++ is the right language for a project of mine, I wanna test the UTF-8 capabilities. According to references, I built this example:

#include <string> #include <iostream> using namespace std; int main() { wstring str; while(getline(wcin, str)) { wcout << str << endl; if(str.empty()) break; } return 0; } 

But when I type in an UTF-8 character, it misbehaves:

$ > ./utf8 Hello Hello für f $ > 

Not only it doesn't print the ü, but also quits immediately. gdb told me there was no crash, but a normal exit, yet I find that hard to believe.

5
  • Which platform are you targeting (Windows, Linux, etc)? Commented Dec 14, 2011 at 23:35
  • Linux, actually. If it works on windows, too, that is kind of a bonus. Commented Dec 14, 2011 at 23:39
  • 2
    Is your locale set to a UTF-8 encoding? Commented Dec 14, 2011 at 23:47
  • 1
    Doesn't necessarily follow. Anyway, it works with normal string, cin, cout, not with the w... versions here, I suspect they want UTF-32 (or 16?). Commented Dec 14, 2011 at 23:54
  • 1
    Some of my previous questions on the topic: #1, #2, #3 Commented Dec 14, 2011 at 23:55

3 Answers 3

10

Don't use wstring on Linux.

std::wstring VS std::string

Take a look at first answer. I'm sure it answers your question.

  1. When I should use std::wstring over std::string?

On Linux? Almost never (§).

On Windows? Almost always (§).

Sign up to request clarification or add additional context in comments.

3 Comments

+1 : Take a look at this answer. I'm sure it links to an answer to your question.
In the boost::spirit comments on UTF-8 they're always talking about using wchar_t.
@Scán: I'd guess they use wchar_t all the time for code points, used when translating UTF8 to and from anything. wchar_t is not a good character for UTF8 itself though.
10

The language itself has nothing to do with unicode or any other character coding. It is tied to operating system. Windows uses UTF16 for unicode support which implies using wide chars (16-bit wide chars) - wchar_t or std:wstring. Each Win Api function operating with strings requires wide char input.

But unix-based systems i.e. Mac OS X or Linux use UTF8. Of course - it is only a matter of how you handle bytes in the array, so you can have UTF16 string stored in common C array or std:string container. This is why you do not see any wstrings in cross-platform code; instead all strings are handled as UTF8 and re-encoded when necessary to UTF16 (on windows).

You have more options how to handle this a bit confusing stuff. I personally do it as mentioned above - by strictly using UTF8 coding in all the application, re-encoding strings when interacting with Windows Api and directly using them on Mac OS X. For the win re-encoding I use great conversion helpers:

C++ UTF-8 Conversion Helpers (on MSDN, available under the Apache License, Version 2.0).

You can also use cross-platform Qt String which defines conversion functions from UTF8 to/from UTF16 and other codings (ANSI, Latin...).

So the answer above - on unix use always UTF8 (std::string, char), on Windows UTF16 (std::wstring, wchar_t) is true.

2 Comments

So what do you propose should I do when I want to make a language compiler/interpreter that treats everything as UTF-8 on both systems?
Well, there is no simple answer and "ultimate" solution. It depends on what compilers, IDEs and APIs you use. I would recommend you to use some cross-platform application framework, ideally Qt by Nokia - qt.nokia.com. It is completely free for open source projects and even for commercial ones - if you ensure compliance with the GNU General Public License (LGPL).
4

Remember that on startup of the main program, the "C" locale is selected as default. You probably don't want this if you handle utf-8. Calling setlocale(LC_CTYPE, "") turns off this default, and you get whatever is defined in the environment (presumably a utf-8 locale).

1 Comment

Yes! Contrary to some other answers, it is perfectly OK to use wchar_t on Linux. You absolutely have to use the right locale though.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.