0

I currently have a std::string and it contains this

"\xa9 2006 FooWorld" 

Basically it contains the symbol © . This string is being passed to a method to an external API that takes in UTF-8. How could I make this string UTF-8 compatible ? Any suggestions. I read here I could use std::wstring_convert but I am not sure how to apply it in my case. Any suggestions would be appreciated.

5
  • For that one character it's probably not worth anything complicated. Just hardcode the utf-8 equivalent. utf8-chartable.de Commented Apr 5, 2018 at 0:24
  • The thing is it could be multiple characters Commented Apr 5, 2018 at 0:24
  • You should probably have that in the question. :) Personally, I'd use this: utfcpp.sourceforge.net Commented Apr 5, 2018 at 0:26
  • 1
    std::string stores bytes, not characters. So if you do not know the original encoding, there's no way guaranteed to work. If you know the original encoding is utf8, then you do not need anything extra, because, again, std::string stores the encoding bytes. Commented Apr 5, 2018 at 5:58
  • maybe you want to read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) Commented Apr 6, 2018 at 1:20

2 Answers 2

1

That's simple: use a UTF-8 string literal:

u8"\u00A9 2006 FooWorld" 

That will result in a const char[] that is a properly encoded UTF-8 string.

Sign up to request clarification or add additional context in comments.

3 Comments

for instance if I have std::basic_string str = "\xa9 2006 FooWorld" How do I append u8 to it ?
@MistyD: You change the code to read: std::string str = u8"\u00A9 2006 FooWorld". If you're not allowed to change the literal itself, then this is a duplicate as previously outlined.
When using any of the Unicode-aware literal prefixes, you can use the actual Unicode character instead of using its codepoint/codeunits manually, eg: u8"© 2006 FooWorld". Let the compiler do the work for you.
0

In C++11 and later, the best way to get a UTF-8 encoded string literal is to use the u8 prefix:

std:string str = u8"\u00A9 2006 FooWorld"; 

or:

std:string str = u8"© 2006 FooWorld"; 

However, you can use std::wstring_convert, too (especially if your input data is not a string literal):

#include <codecvt> #include <locale> #include <string> std::wstring wstr = L"© 2006 FooWorld"; // or whatever... std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> convert; std::string str = convert.to_bytes(wstr); 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.