See my answer at std::wstring VS std::stringstd::wstring VS std::string for a complete difference between std::string and std::wstring.
See my answer at std::wstring VS std::string for a complete difference between std::string and std::wstring.
See my answer at std::wstring VS std::string for a complete difference between std::string and std::wstring.
##Edit : REJOICE!!!
Nine hours ago, someone (probably the one who downvoted every answer but Pavel Radzivilovsky's one) downvoted this answer. Of course, without any comment pointing to what's wrong with my answer.
\o/
##Edit : REJOICE!!!
Nine hours ago, someone (probably the one who downvoted every answer but Pavel Radzivilovsky's one) downvoted this answer. Of course, without any comment pointing to what's wrong with my answer.
\o/
###But###3.a - But Windows is supposed to not handle UTF-16 correctly
The canonical"canonical" example I saw described was the EDIT Win32 control which is supposed to be unable to backspace correctly a non-BMP UTF-16 char on Windows (Not that I did not verify the bug, I just don't care enough).
This is a Microsoft issue. Nothing you'll decide in your code will change the fact this bug exist or not in the Win32 API. So using UTF-8 chars on Windows won't correct the bug on the EDIT control. The only thing you can hope to do is to create your own EDIT control (subclass it and handle the BACKSPACE event correctly?) or your own conversion functions.
Don't mix a supposed bug in the Windows API with your own codetwo different problems, that is: a supposed bug in the Windows API and your own code. Nothing in your own code will avoid the bug in the Windows API unless you do NOT use the supposed bugged Windows API.
###3.b - But UTF-16 on Windows, UTF-8 on Linux, isn't that complicated?
Yes, it could lead to bugs on some platform that won't happen on others, if you assume too much about characters.
I assumed your primary platform was Windows (or that you wanted to provide a library for both wchar_t and char users).
But if this is not the case, if Windows is not your primary platform, then there is the solution of assuming all your char and std::string will contain UTF-8 characters, unless told different. You'll need, then, to wrap APIs to make sure that your char UTF-8 string will not be mistaken for an ANSI (or other codepaged) char string on Windows. For example, the name of the files for the stdio.h and iostream libraries will be assumed to be codepaged, as well as the ANSI version of the Win32 API (CreateWindowA, for example).
This is the approach of GTK+ which uses UTF-8 characters, but not, surprisingly, of QT (upon which Linux KDE is built) which uses UTF-16.
Source:
- QT : http://doc.qt.nokia.com/4.6/qstring.html#details
- GTK+ : http://www.gtk.org/api/2.6/glib/glib-Character-Set-Conversion.html#filename-utf-8
Still, it won't protect you from the "Hey, but Win32 edit controls don't handle my unicode characters!" problem, so you'll still have to subclass that control to have the desired behaviour (if the bug still exists)...
###But Windows is supposed to not handle UTF-16
The canonical example I saw was the EDIT Win32 control which is unable to backspace correctly a non-BMP UTF-16 char on Windows (Not that I did not verify the bug, I just don't care enough).
This is a Microsoft issue. Nothing you'll decide in your code will change the fact this bug exist or not in the Win32 API. So using UTF-8 chars on Windows won't correct the bug on the EDIT control. The only thing you can hope to do is to create your own EDIT control (subclass it and handle the BACKSPACE event correctly?).
Don't mix a supposed bug in the Windows API with your own code. Nothing in your own code will avoid the bug in the Windows API unless you do NOT use the supposed bugged Windows API.
###3.a - But Windows is supposed to not handle UTF-16 correctly
The "canonical" example I saw described was the EDIT Win32 control which is supposed to be unable to backspace correctly a non-BMP UTF-16 char on Windows (Not that I did not verify the bug, I just don't care enough).
This is a Microsoft issue. Nothing you'll decide in your code will change the fact this bug exist or not in the Win32 API. So using UTF-8 chars on Windows won't correct the bug on the EDIT control. The only thing you can hope to do is to create your own EDIT control (subclass it and handle the BACKSPACE event correctly?) or your own conversion functions.
Don't mix two different problems, that is: a supposed bug in the Windows API and your own code. Nothing in your own code will avoid the bug in the Windows API unless you do NOT use the supposed bugged Windows API.
###3.b - But UTF-16 on Windows, UTF-8 on Linux, isn't that complicated?
Yes, it could lead to bugs on some platform that won't happen on others, if you assume too much about characters.
I assumed your primary platform was Windows (or that you wanted to provide a library for both wchar_t and char users).
But if this is not the case, if Windows is not your primary platform, then there is the solution of assuming all your char and std::string will contain UTF-8 characters, unless told different. You'll need, then, to wrap APIs to make sure that your char UTF-8 string will not be mistaken for an ANSI (or other codepaged) char string on Windows. For example, the name of the files for the stdio.h and iostream libraries will be assumed to be codepaged, as well as the ANSI version of the Win32 API (CreateWindowA, for example).
This is the approach of GTK+ which uses UTF-8 characters, but not, surprisingly, of QT (upon which Linux KDE is built) which uses UTF-16.
Source:
- QT : http://doc.qt.nokia.com/4.6/qstring.html#details
- GTK+ : http://www.gtk.org/api/2.6/glib/glib-Character-Set-Conversion.html#filename-utf-8
Still, it won't protect you from the "Hey, but Win32 edit controls don't handle my unicode characters!" problem, so you'll still have to subclass that control to have the desired behaviour (if the bug still exists)...