Conversion between wchar_t char in ANSI Code Page

Question

If I am in the ANSI codepage only environment.

Does this conversion wide char to char:

char ansi_cstr[size_of_ansi_str]; WideCharToMultiByte(CP_ACP, 0, ansi_wstr.c_str(), -1, ansi_str, size_of_ansi_str, 0, 0); std::string ansi_str = std::string(ansi_cstr);

equal to following

std::string ansi_str = std::string(ansi_wstr.begin(), ansi_wstr.end());

and char to wide char

wchar_t ansi_wcstr[size_of_ansi_str]; MultiByteToWideChar(CP_ACP, 0, ansi_str.c_str(), -1, ansi_wcstr, size_of_ansi_str); std::wstring ansi_wstr = std::wstring(ansi_wcstr);

equal to

std::wstring ansi_wstr = std::wstring(ansi_str.begin(), ansi_str.end());

Are these two cases remain the same behavior in the ansi codepage only environment?

The real question is why would you be using ANSI code pages in the year 2014? — Cody Gray
– Cody Gray ♦, Commented May 15, 2014 at 11:16
Even worse: a ANSI codepage only environment. I think the first Windows which had Unicode support (via Unicows) was Windows 95, so this would be Windows 3.1 from 1994. 20 years old. Talking about legacy development. Then again, we still see Turbo C++ questions around here. — MSalters
– MSalters, Commented May 15, 2014 at 13:24
Windows 95/98/ME were an Ansi-based OS. UCS-2 was used in NT4, and then replaced with UTF-16 in Windows 2000. The two product lines were not merged together into a single Unicode OS until XP. — Remy Lebeau
– Remy Lebeau, Commented May 16, 2014 at 1:20
@CodyGray Maybe because, even in 2018, so many shapefiles' DBFs still use 0x57 Language Driver ID? — Rodrigo
– Rodrigo, Commented Nov 29, 2018 at 16:14

MSalters · Accepted Answer · 2014-05-15 10:06:15Z

There's no such thing as the ANSI code page environment. There are dozens.

Your two "shortcut" conversions are incorrect in all of them.

The conversion from ASCII char to UTF-16 wchar_t would work with your last method, but this fails with the second half of most ANSI code pages. It works best with the Western European code page, where it gets ~only 32 characters wrong. For instance. the Euro sign € will always be mis-converted.

Remy Lebeau · Accepted Answer · 2014-05-16 01:26:38Z

WideCharToMultiByte(CP_ACP, 0, ansi_wstr.c_str(), -1, ansi_str, size_of_ansi_str, 0, 0);

IS NOT the same as

std::string ansi_str = std::string(ansi_wstr.begin(), ansi_wstr.end());

WideCharToMultiByte() performs a real conversion from UTF-16 to ANSI using the codepage that CP_ACP refers to on that PC (which can be different on each PC based to user locale settings). std::string(begin, end) merely loops through the source container type-casting each element to char and does not perform any codepage conversion at all.

Likewise:

MultiByteToWideChar(CP_ACP, 0, ansi_str.c_str(), -1, ansi_wcstr, size_of_ansi_str);

IS NOT the same as

std::wstring ansi_wstr = std::wstring(ansi_str.begin(), ansi_str.end());

For the same reason. MultiByteToWideChar() performs a real conversion from ANSI to UTF-16 using the CP_ACP codepage, whereas std::wstring(begin, end) simply type-casts the source elements to wchar_t without any conversion at all.

The type-casts would be equivelent to the API conversions ONLY if the source strings are using ASCII characters in the 0x00-0x7F range. But if they are using non-ASCII characters, all bets are off.

Collectives™ on Stack Overflow

Conversion between wchar_t char in ANSI Code Page

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related