As a reference to one of the previous answers, you should not use wchar_t and w* functions on Linux. POSIX APIs use char data type and most POSIX implementations use UTF-8 as a default encoding. Quoting the C++ standard (ISO/IEC 14882:2011)
5.3.3 Sizeof
sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1. The result of sizeof applied to any other fundamental type (3.9.1) is implementation-defined. [ Note: in particular, sizeof(bool), sizeof(char16_t), sizeof(char32_t), and sizeof(wchar_t) are implementation-defined. 74 — end note ]
UTF-8 uses 1-byte code units and up to 4 code units to represent a code point, so char is enough to store UTF-8 strings, though to manipulate them you are going to need to find out if a specific code unit is represented by multiple bytes and build your processing logic with that in mind. wchar_t has an implementation-defined size and the Linux distributions that I have seen have a size of 4 bytes for this data type.
There is another problem that the mapping from the source code to the object code may transform your encoding in a compiler-specific way:
2.2 Phases of translation
Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary.
Anyway, in the most cases you don't have any conversions on your source code so the strings that you put into char* stay unmodified. If you encode your source code with UTF-8 then you are going to have bytes representing UTF-8 code units in your char*s.
As for your code example: it does not work as expected because 1 char has a size of 1 byte. Unicode code points may require several (up to 4) UTF-8 code units to be serialized (for UTF-8 1 code unit == 1 byte). You can see here that U+1234 requires three bytes E1 88 B4 when UTF-8 is used and, therefore, cannot be stored in a single char. If you modify your code as follows it's going to work just fine:
#include <iostream> int main() { char* str = "\u1234"; std::cout << str << std::endl; return 0; }
This is going to output ሴ though you may see nothing depending on your console and the installed fonts, the actual bytes are going to be there. Note that with double quotes you also have a \0 terminator in-memory.
You could also use an array, but not with single quotes since you would need a different data type (see here for more information):
#include <iostream> int main() { char* str = "\u1234"; std::cout << str << std::endl; // size of the array is 4 because \0 is appended // for string literals and there are 3 bytes // needed to represent the code point char arr[4] = "\u1234"; std::cout.write(arr, 3); std::cout << std::endl; return 0; }
The output is going to be ሴ on the two different lines in this case.