3

I am using C++ on Windows. I have some data in a std::string that I want to write to a file with UTF-8 encoding. How do I do this?

8
  • What have you tried? All you need is basically file << string; Commented Apr 29, 2021 at 13:33
  • Do you need a BOM ( en.wikipedia.org/wiki/Byte_order_mark ) at the beginning of the file ? Commented Apr 29, 2021 at 13:34
  • Pretty sure that for a UTF-8 ofstream you can do std::basic_ofstream<char8_t> if that's what you're asking. Commented Apr 29, 2021 at 13:40
  • I have tried file << string. But when I check the encoding of the created file in notepad, it is ANSI and not UTF-8. Commented Apr 29, 2021 at 13:44
  • @VikasKakkar The encoding of NotePad is the encoding it uses to interpret the data contained in your file (and to display it). It doesn't tell what encoding was used to generate the file. Basically, encoding is just a convention (at a semantic level), but in reality, your file just contains bytes ^^ Commented Apr 29, 2021 at 13:51

2 Answers 2

2

I have some data in a std::string that I want to write to a file with UTF-8 encoding. How do I do this?

If the string contains the text in UTF-8 encoding, then simply write the data. You can use std::ofstream for example.

If the string doesn't contain the data in UTF-8, then before writing, you must first convert from the encoding that the data is currently in. C++ standard library doesn't have general character encoding conversion functions (disregarding a few that are deprecated). There's generally no guaranteed way to detect the current encoding. You should simply know it beforehand.


But when I check the encoding of the created file in notepad, it is ANSI and not UTF-8

Like I mentioned in previous section regarding detecting the source encoding of the string, there is no guaranteed way to do that. Notepad also doesn't have this superpower. It probably uses simplistic rules to guess the encoding. Sometimes it guesses wrong.

UTF-8 has the same representation for the characters in the 7 bit ASCII encoding as the ASCII itself (I'm guessing that notepad calls ASCII by the name "ANSI"). If your string contains only those characters, then the UTF-8 encoding of the string is indistinguishable from ASCII. In such case, notepad is likely going to guess wrong (although technically the guess is also correct since the UTF-8 would in that case incidentally be ASCII as well).

Sign up to request clarification or add additional context in comments.

3 Comments

"C++ standard library doesn't have general character encoding conversion functions" - actually it does have a few, but they are not very good. And the one that would actually be useful here - std::wstring_convert with std::codecvt_utf8/_utf16 - is deprecated with no replacement in sight yet.
@RemyLebeau Why would std::codecvt_utf8/_utf16 or std::wstring_convert be useful in converting some narrow encoding stored in std::string into another narrow encoding (specifically UTF-8). Neither of them is UTF-16.
a narrow-to-narrow conversion requires an intermediate conversion to Unicode/UTF-16, so narrow->Unicode/UTF16->narrow/UTF8. wstring_convert/`codecvt is useful for that 2nd step, at least.
0

This is similar to How do I write a UTF-8 encoded string to a file in windows, in C++.

Note that writing to file across platforms is different (in windows you have CreateFile, WriteFile, ReadFile, CloseHandle, which is not limited to files only and can perform operation on Device-Drivers), were in linux you have different sets of fuctions. It's best to check the platform you're intending to use (in your case, Windows).

1 Comment

Um, yes, there are platform-specific ways of managing files. But the C++ standard library has code for managing files that masks those differences so you don't have to write different code for different platforms.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.