2

I want to use string encoded in the UTF-8 (I'm sorry if its a bad wording, please correct me so I understand what is a proper one). Also, I want my program to be cross-platform.

IIUC, the proper way to do so is to use std::wstring and then convert it to be UTF8. The trouble is that I think that on Linux std::string is already encoded in UTF8 (I may be wrong so).

So what is the best way to create a UTF8 representation of std::{w}string with the least possible conditional code?

The strings are constants, they are hard coded and they will be used in the SQLite queries.

P.S.: I am going to try with XCode 5, hoping that it is C++11 compliant.

8
  • 1
    What do you mean by "use"? Commented Jan 31, 2016 at 22:42
  • The encoding of a string is determined by the code that creates that string. Where are you getting these strings that you want to "use" in some unspecified fashion? And exactly how do you plan to "use" them? Commented Jan 31, 2016 at 23:26
  • @一二三, the SQLite API will accept the query string encoded as UTF8 string, in order to support non-English table and database names. Commented Jan 31, 2016 at 23:38
  • 1
    @igor: you didn't answer nicol's question: where do these strings come from? User input? Command-line arguments? Hard-coded as string literals? Something else? Commented Feb 1, 2016 at 0:11
  • Unfortunately, there is no useful Unicode support in standard C++. I believe the most common way to handle Unicode in C++ is ICU. Commented Feb 1, 2016 at 0:28

3 Answers 3

5

they are hard coded.

If all of the strings in question are hard-coded string literals, then you don't need anything special.

Use the u8 prefix when declaring such strings will ensure that they are encoded in UTF-8. On every platform that supports this feature of C++11. The type of such strings is const char [], just like a regular string literal:

const char my_utf8_literal[] = u8"Some String."; 

Of course, these can be stored in std::string (not wstring) as well:

std::string my_utf8_string = u8"Some String."; 

You said that your goal was to use them in SQLite queries and commands. In that case, it should be pretty easy to make everything work. You would be using SQLite's string formatting commands to build queries, and while they are blind to UTF-8, so long as all of your inputs are UTF-8, the outputs will also be valid UTF-8. So there shouldn't be any problems.

Sign up to request clarification or add additional context in comments.

2 Comments

if I use "std::string my_utf8_string = u8"Some String";", I will still be able to use "my_utf8_string.c_str()", right? SQLite has C interface, so...
@Igor: Yes. Only the machinery that interprets the string in some way (e.g. character classification, filenames, i/o) is affected.
0

For UTF-8 processing there's a Library called tiny-utf8. It provides a drop-in replacement for std::string or more specifically std::u32string (::value_type is char32_t, but data representation is utf8 with char's). That's more or less the easiest way to handle utf8 in C++11.

The strings are constants, they are hard coded and they will be used in the SQLite queries.

If you have hardcoded strings, you would just have to change the encoding of your source file to UTF8 and prepend the U-prefix to your string literal, with which you can then construct an utf8_string class to work with it.

So what is the best way to create a UTF8 representation of std::{w}string with the least possible conditional code?

IMHO If you are able to, don't work with wchar_t and wstring, since they are probably the most vaguely specified and platform specific things in the C++ string library.

I hope this helped at least a Little bit.

Cheers, Jakob

Comments

-2

The question has changed after this answer was posted, adding that the strings are hardcoded literals to be used in SQL queries. For that simple u8 strings are a simple solution, and parts answered here become irrelevant. I'm not going to chase the question through this or further changes.

Re

I want to use string encoded in the UTF-8 (I'm sorry if its a bad wording, please correct me so I understand what is a proper one). Also, I want my program to be cross-platform.

Then you're plain out of luck.

Microsoft's documentation explicitly states that their setlocale does not support UTF-8:

MSDN docs on setlocale:

The set of available locale names, languages, country/region codes, and code pages includes all those supported by the Windows NLS API except code pages that require more than two bytes per character, such as UTF-7 and UTF-8. If you provide a code page value of UTF-7 or UTF-8, setlocale will fail, returning NULL.


Heads-up: in spite of the fact that It Does Not Work™, and is explicitly documented as not working, there are numerous web sites and blogs, probably even books, that recommend the approach, in a sort of ostrich-like way. They often look authoritative. But the info is rubbish.


Re

what is the best way to create a UTF8 representation of std::{w}string with the least possible conditional code?

That depends on what you have available. The standard library offers std::codecvt. It's been asked about and answered before, e.g. (Convert wstring to string encoded in UTF-8).

10 Comments

@CheersandhthAlf, how portable is std::codecvt? Also is it C++11 only?
@Igor: std::codecvt has been there since C++98, but the UTF-8 support wasn't added until C++11. That's very portable, since it's part of the standard library.
What exactly does Windows not supporting the UTF-8 locale have to do with using UTF-8 in strings? It seems to me that this would only matter to code that's passing those strings to the Windows API, and cross-platform code, by definition is not doing so.
OK, but that doesn't really change the question. Using UTF-8 strings has nothing to do with locales. Not unless you want them to. Indeed, the OP never even mentioned locales.
That doesn't answer my question. I use UTF-8 just fine in projects that never even consider using C++'s crappy locale support.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.