Converting unicode strings and vice-versa

Question

I'm kind of new to using Unicode string and pointers and I've no idea how the conversion to unicode to ascii and versa-versa works. Following is what I'm trying to do,

const wchar_t *p = L"This is a string";

If I wanted to convert it to char*, how would the conversion work with converting wchar_t* to char* and vice-versa?

or by value using wstring to string class object and vice-versa

std::wstring wstr = L"This is a string";

If i'm correct, can you just copy the string to a new buffer without conversion?

Drew Dormann · Accepted Answer · 2019-05-01 18:49:35Z

23

In the future (VS 2010 already supports it), this will be possible in standard C++ (finally!):

#include <string> #include <locale> std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; const std::wstring wide_string = L"This is a string"; const std::string utf8_string = converter.to_bytes(wide_string);

edited May 1, 2019 at 18:49

Drew Dormann

65.4k14 gold badges133 silver badges200 bronze badges

answered Jan 24, 2011 at 20:29

Philipp

50.1k12 gold badges88 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Tyler Liu Over a year ago

I think there is a typo std::wstring in the last line should be std:string

Dan Nissenbaum Over a year ago

That the last line should be std::string: confirmed from en.cppreference.com/w/cpp/locale/wstring_convert/to_bytes

AlastairG Jan 27 at 8:46

And a bit further in the future, it won't. Deprecated in C++17, removed in C++26.

MSalters · Accepted Answer · 2011-01-25 09:51:16Z

The conversion from ASCII to Unicode and vice versa are quite trivial. By design, the first 128 Unicode values are the same as ASCII (in fact, the first 256 are equal to ISO-8859-1).

So the following code works on systems where char is ASCII and wchar_t is Unicode:

const char* ASCII = "Hello, world"; std::wstring Unicode(ASCII, ASCII+strlen(ASCII));

You can't reverse it this simple: 汉 does exist in Unicode but not in ASCII, so how would you "convert" it?

There is also from_bytes which you can use like --- std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; const std::wstring wstring = converter.from_bytes(string);

Eugene Mayevski 'Callback · Accepted Answer · 2011-01-24 19:52:55Z

3

The solutions are platform-dependent. On Windows use MultiByteToWideChar and WideCharToMultiByte API functions. On Unix/linux platforms iconv library is quite popular.

answered Jan 24, 2011 at 19:52

Eugene Mayevski 'Callback

46.3k8 gold badges75 silver badges125 bronze badges

1 Comment

Coder12345 Over a year ago

Beware that MultiByteToWideChar has a bug when converting codepage 50225 (Korean - ISO-2022-KR) which converts characters incorrectly as noted on support.microsoft.com/en-us/kb/960293 - The suggested workaround is to use IMultiLanguage::ConvertStringToUnicode instead which converts the same characters properly - please update answer to make this more visible.

Thomas · Accepted Answer · 2011-01-24 19:42:00Z

3

C++ by itself doesn't offer this functionality. You'll need a separate library, like libiconv.

answered Jan 24, 2011 at 19:42

Thomas

183k57 gold badges383 silver badges510 bronze badges

Comments

cpx · Accepted Answer · 2011-01-24 20:39:46Z

3

C Standard library functions: mbstowcs and wcstombs

answered Jan 24, 2011 at 20:39

cpx

17.7k21 gold badges95 silver badges152 bronze badges

Comments

simon · Accepted Answer · 2015-11-13 09:47:51Z

The widen() algorithm converts char to wchar_t :

char a; a = 'a'; whcar_t wa = cin.widen(a);

Of course, you have to put it into a loop. And resolve the *; The opposite is accomplished by narrow()

Collectives™ on Stack Overflow

Converting unicode strings and vice-versa

6 Answers 6

3 Comments

1 Comment

1 Comment

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

1 Comment

1 Comment

Comments

Comments

Comments

Linked

Related