3

I am using two libraries one that stores UTF-8 strings in std::wstring and another stores strings ( UTF-8) in std::string.
What is the best / efficient method I can use to pass strings between the two libraries.
I am currently on Windows using Visual C++ v9 Express but would prefer a portable solution.

4
  • 5
    When you say "stores UTF-8 string in std::wstring", what are you referring to exactly? Did you actually mean UTF-16? std::wstring is not suitable for storing UTF-8 octets (but std::string is). Commented Jul 28, 2011 at 18:53
  • @Remy Lebeau It is an ODBC library that retrieves UTF-8 data from a database and delivers the data in std::wstrings. How the data is actually stored inside the library I really don't know. Commented Jul 28, 2011 at 19:01
  • It does not matter how the library manipulates data internally. What is important is how it passes that data to/from your code. If it is using std::wstring for that, then it is likely using/expecting UTF-16. Which makes sense, as UTF-8 and UTF-16 are just different encodings of the same Unicode character set. The database could be using any charset it wanted other than UTF-8, and ODBC would likely handle it internally and still utilize UTF-16 when passing data to/from you for consistency. Commented Jul 28, 2011 at 20:22
  • If you retreive a character outside the ASCII range e.g. à using your ODBC library, what is the decimal or hex value of wstring[0]? Commented Jul 28, 2011 at 21:38

2 Answers 2

5

Assuming you mean UTF-16 and not UTF-8 for std::wstring, you will have to encode/decode the strings from one library to the other. I'm not sure if/what the STL provides for that, but you can use Windows's own MultiByteToWideChar() and WideCharToMultiByte() functions to convert between UTF-8 and UTF-16 with just a few lines of code. You could then wrap that into your own functions so you can replace the logic when you find something more portable, eg:

std::wstring Utf8ToUtf16(const std::string &s) { std::wstring ret; int len = MultiByteToWideChar(CP_UTF8, 0, s.c_str(), s.length(), NULL, 0); if (len > 0) { ret.resize(len); MultiByteToWideChar(CP_UTF8, 0, s.c_str(), s.length(), const_cast<wchar_t*>(ret.c_str()), len); } return ret; } std::string Utf16ToUtf8(const std::wstring &s) { std::string ret; int len = WideCharToMultiByte(CP_UTF8, 0, s.c_str(), s.length(), NULL, 0, NULL, NULL); if (len > 0) { ret.resize(len); WideCharToMultiByte(CP_UTF8, 0, s.c_str(), s.length(), const_cast<char*>(ret.c_str()), len, NULL, NULL); } return ret; } 
Sign up to request clarification or add additional context in comments.

5 Comments

Note that this is Windows specific, but UTF-16 hopefully means Windows here.
Will not compile as c_str returns a const C-string. But +1 for being on the right way.
UTF-16 is not Windows-specific. The only Windows-specific portion is the API functions used. Like I said, this was just to demonstrate how to do it. When the OP finds a more portable solution, he/she can replace the API functions without having to rewrite the rest of his/her code.
I added a couple of const_cast casts now.
UTF-16 itself is not Windows-specific, but the assumption that wchar_t is UTF-16 is Windows-specific.
1

Consider ICU. It is portable and has a lot of converters between encodings

2 Comments

Too big for my current needs. My program is less than 400KB, statically linked to the runtime. ICU will likely more than double the size and I am not sure if I can statically link against it.
If not ICU then iconv. That's even POSIX.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.