What is the difference between "UTF-16" and "std::wstring"?

Question

Is there any difference between these two string storage formats?

there's a pretty good answer to this question here: stackoverflow.com/questions/402283/stdwstring-vs-stdstring/… — Idan K
– Idan K, Commented Nov 22, 2010 at 15:49

DavidRR · Accepted Answer · 2015-04-27 13:45:54Z

std::wstring is a container of wchar_t. The size of wchar_t is not specified—Windows compilers tend to use a 16-bit type, Unix compilers a 32-bit type.

UTF-16 is a way of encoding sequences of Unicode code points in sequences of 16-bit integers.

Using Visual Studio, if you use wide character literals (e.g. L"Hello World") that contain no characters outside of the BMP, you'll end up with UTF-16, but mostly the two concepts are unrelated. If you use characters outside the BMP, std::wstring will not translate surrogate pairs into Unicode code points for you, even if wchar_t is 16 bits.

Do you mean that std::wstring is the same with UTF-16 for only the non-BMP unicode character when used in Windows operating system?
No. std::wstring is just a container of integers. The encoding of the container depends entirely on the data you insert into the container.
+1: For people unfamiliar with UTF it may be wise to define BMP.
Your last paragraph is the answer to my question. Thank you.

DavidRR · Accepted Answer · 2015-04-27 14:10:17Z

UTF-16 is a specific Unicode encoding. std::wstring is a string implementation that uses wchar_t as its underlying type for storing each character. (In contrast, regular std::string uses char).

The encoding used with wchar_t does not necessarily have to be UTF-16—it could also be UTF-32 for example.

It could also be UCS-2 or S-JIS or Big 5 or ... well, anything.

LinuxDev · Accepted Answer · 2020-07-07 04:49:11Z

3

UTF-16 is a concept of text represented in 16-bit elements but an actual textual character may consist of more than one element

std::wstring is just a collection of these elements, and is a class primarily concerned with their storage.

The elements in a wstring, wchar_t is at least 16-bits but could be 32 bits.

edited Jul 7, 2020 at 4:49

LinuxDev

3652 silver badges8 bronze badges

answered Nov 22, 2010 at 15:48

CashCow

31.6k5 gold badges65 silver badges97 bronze badges

4 Comments

hkBattousai Over a year ago

Can you please explain in more detail, like giving an example. For instance the character 'A' is stored in std::wstring like "0x0041". How is it stored in UTF-16 format?

Inverse Over a year ago

16-byte ?? woah that's a hardcore character encoding

Matthieu M. Over a year ago

@Inverse: That's why everyone should just use ASCII, there wouldn't be so much grief on memory use ;)

DavidRR Over a year ago

For those who may not understand the humor in the above comments, UTF-16 is a 16-bit Unicode encoding. Also, in UTF-16, a character that is defined using more than one 16-bit element is done so via surrogate pairs.

Collectives™ on Stack Overflow

What is the difference between "UTF-16" and "std::wstring"?

3 Answers 3

4 Comments

1 Comment

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

4 Comments

Linked

Related