13

I want to convert wstring to UTF-8 Encoding, but I want to use built-in functions of Linux.

Is there any built-in function that convert wstring or wchar_t* to UTF-8 in Linux with simple invokation?

Example:

wstring str = L"file_name.txt"; wstring mode = "a"; fopen([FUNCTION](str), [FUNCTION](mode)); // Simple invoke. cout << [FUNCTION](str); // Simple invoke. 
4
  • What encoding are you assuming for wstring? Commented Sep 19, 2011 at 10:19
  • If you use std::string, and print that out to the console, the linux terminal,(at least on Ubuntu), will by default interpret it as utf-8. Commented Sep 19, 2011 at 10:25
  • @Darcy: well, this is true if the current locale is UTF-8, which is the default on most current Linux distributions, but it's not guaranteed. Commented Sep 19, 2011 at 10:28
  • @DavidHeffernan: std::wstring on linux is always UTF-32 isn't it? Commented Sep 26, 2011 at 22:19

4 Answers 4

17

If/when your compiler supports enough of C++11, you could use wstring_convert

#include <iostream> #include <codecvt> #include <locale> int main() { std::wstring_convert<std::codecvt_utf8<wchar_t>> utf8_conv; std::wstring str = L"file_name.txt"; std::cout << utf8_conv.to_bytes(str) << '\n'; } 

tested with clang++ 2.9/libc++ on Linux and Visual Studio 2010 on Windows.

Sign up to request clarification or add additional context in comments.

2 Comments

std::wbuffer_convert, std::wstring_convert, and the <codecvt> header (containing std::codecvt_mode, std::codecvt_utf8, std::codecvt_utf16, and std::codecvt_utf8_utf16) are deprecated in C++17. (The std::codecvt class template is NOT deprecated.)
@A.Danesh it was an aspirational deprecation, like with strstreams that were deprecated in C++98, but are still a mandatory part of C++20
8

The C++ language standard has no notion of explicit encodings. It only contains an opaque notion of a "system encoding", for which wchar_t is a "sufficiently large" type.

To convert from the opaque system encoding to an explicit external encoding, you must use an external library. The library of choice would be iconv() (from WCHAR_T to UTF-8), which is part of Posix and available on many platforms, although on Windows the WideCharToMultibyte functions is guaranteed to produce UTF8.

C++11 adds new UTF8 literals in the form of std::string s = u8"Hello World: \U0010FFFF";. Those are already in UTF8, but they cannot interface with the opaque wstring other than through the way I described.

See this question for a bit more background.

4 Comments

C++11's utf-8 strings can interface with wstrings through wstring_convert
@Cubbi: I remain unconvinced that that has anything to do with UTF8. It looks like a mere wrapper for wcstombs. (There's a header <cuchar> that looks more promising.)
wstring_convert is not related to wcstombs. It's a wrapper for codecvt facets, such as codecvt_utf8.
@Kerreck SB: I think I see your point: except for the scant <cuchar> functions there is no portable connection between the C++03's generic narrow-multibyte/wide conversions and C++11's explicit UTF-8/UTF-16/UTF-16le/UCS2/UTF-32/UCS4 conversions. Interesting observation.
2

It's quite plausible that wcstombs will do what you need if what you actually want to do is convert from wide characters to the current locale.

If not then you probably need to look to ICU, boost or similar.

2 Comments

wcstombs has no notion of specific encodings. This is not the answer.
wcstombs should work if and only if the current locale is UTF-8.
-1

Certainly there is no function built in on Linux, because the name Linux references the kernel only, which doesn't have anything to with it. I seriously doubt that the libc that comes with gcc has such a function, and

$ man -k utf 

supports this theory. But there are plenty of good UTF-8 libraries around. I personally recommend the iconv library for such conversions.

1 Comment

your man search lies to you: Linux glibc has an iconv implementation: gnu.org/s/hello/manual/libc/glibc-iconv-Implementation.html

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.