I am using two libraries one that stores UTF-8 strings in std::wstring and another stores strings ( UTF-8) in std::string.
What is the best / efficient method I can use to pass strings between the two libraries.
I am currently on Windows using Visual C++ v9 Express but would prefer a portable solution.
2 Answers
Assuming you mean UTF-16 and not UTF-8 for std::wstring, you will have to encode/decode the strings from one library to the other. I'm not sure if/what the STL provides for that, but you can use Windows's own MultiByteToWideChar() and WideCharToMultiByte() functions to convert between UTF-8 and UTF-16 with just a few lines of code. You could then wrap that into your own functions so you can replace the logic when you find something more portable, eg:
std::wstring Utf8ToUtf16(const std::string &s) { std::wstring ret; int len = MultiByteToWideChar(CP_UTF8, 0, s.c_str(), s.length(), NULL, 0); if (len > 0) { ret.resize(len); MultiByteToWideChar(CP_UTF8, 0, s.c_str(), s.length(), const_cast<wchar_t*>(ret.c_str()), len); } return ret; } std::string Utf16ToUtf8(const std::wstring &s) { std::string ret; int len = WideCharToMultiByte(CP_UTF8, 0, s.c_str(), s.length(), NULL, 0, NULL, NULL); if (len > 0) { ret.resize(len); WideCharToMultiByte(CP_UTF8, 0, s.c_str(), s.length(), const_cast<char*>(ret.c_str()), len, NULL, NULL); } return ret; } 5 Comments
rubenvb
Note that this is Windows specific, but UTF-16 hopefully means Windows here.
dalle
Will not compile as
c_str returns a const C-string. But +1 for being on the right way.Remy Lebeau
UTF-16 is not Windows-specific. The only Windows-specific portion is the API functions used. Like I said, this was just to demonstrate how to do it. When the OP finds a more portable solution, he/she can replace the API functions without having to rewrite the rest of his/her code.
Remy Lebeau
I added a couple of
const_cast casts now.dan04
UTF-16 itself is not Windows-specific, but the assumption that
wchar_t is UTF-16 is Windows-specific.Consider ICU. It is portable and has a lot of converters between encodings
2 Comments
user754425
Too big for my current needs. My program is less than 400KB, statically linked to the runtime. ICU will likely more than double the size and I am not sure if I can statically link against it.
Kerrek SB
If not ICU then
iconv. That's even POSIX.
std::wstringfor that, then it is likely using/expecting UTF-16. Which makes sense, as UTF-8 and UTF-16 are just different encodings of the same Unicode character set. The database could be using any charset it wanted other than UTF-8, and ODBC would likely handle it internally and still utilize UTF-16 when passing data to/from you for consistency.àusing your ODBC library, what is the decimal or hex value of wstring[0]?