I have the following piece of code:
#include <iostream> std::string eps("ε"); int main() { std::cout << eps << '\n'; return 0; } Somehow it compiles with g++ and clang on Ubuntu, and even prints out right character ε. Also I have almost same piece of code which happily reads ε with cin into std::string. By the way, eps.size() is 2.
My question is - how that works? How can we insert unicode character into std::string? My guess is that operating system handles all this work with unicode, but I'm not sure.
EDIT
As with output, I understood that it is terminal who is responsible for showing me right character (ε in this case).
But with input: cin reads symbols to ' ' or any other space character (and as I understand byte by byte). So, if I take Ƞ, which second byte is 32 ' ' it will read only first byte, and then stop. But it reads Ƞ. How?
wstring, which, by the way, doesn't know how to handle all of Unicode's complexities either.Ƞis not stored as the hexadecimal bytes 02 20. Instead, they are encoded in a special UTF-8 format, which forȠis C8 A0.