I believe the output has to do with UTF, but I do not know how. Would someone, please, explain?
#include <iostream> #include <cstdint> #include <iomanip> #include <string> int main() { std::cout << "sizeof(char) = " << sizeof(char) << std::endl; std::cout << "sizeof(std::string::value_type) = " << sizeof(std::string::value_type) << std::endl; std::string _s1 ("abcde"); std::cout << "s1 = " << _s1 << ", _s1.size() = " << _s1.size() << std::endl; std::string _s2 ("abcdé"); std::cout << "s2 = " << _s2 << ", _s2.size() = " << _s2.size() << std::endl; return 0; } The output is:
sizeof(char) = 1 sizeof(std::string::value_type) = 1 s1 = abcde, _s1.size() = 5 s2 = abcdé, _s2.size() = 6 g++ --version prints g++ (Ubuntu 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609
QTCreator compiles like this:
g++ -c -m32 -pipe -g -std=c++0x -Wall -W -fPIC -I../strsize -I. -I../../Qt/5.5/gcc/mkspecs/linux-g++-32 -o main.o ../strsize/main.cpp g++ -m32 -Wl,-rpath,/home/rodrigo/Qt/5.5/gcc -o strsize main.o Thanks a lot!
sizeof('é')and see what you get.std::cout << "sizeof('é') = " << sizeof('é') << std::endl;std::cout << "sizeof(\"é\") = " << sizeof("é") << std::endl;And the output was:sizeof('é') = 4sizeof("é") = 3sizeof('é')is likely promoting thechartoint, that would explain why its size is 4. A string literal"é"is equivilent to aconst char[], sosizeof("é")is 3 because theéis encoded with 2chars in UTF-8 (0xC3 0xA9) followed by the null terminator.std::coutknow that the bytes in position 5 e 6 ofabcdémust be combined in a two byte value, before printing?how can std::cout know that the bytes in position 5 e 6 of abcdé must be combined in a two byte value, before printing?: it doesn't. It blindly outputs the 6 bytes of the string, irrelevant of their content. Your console (ie. term/bash et all) is set to an UTF-8 environment and is displaying the appropriate glyph. See How to set up a clean UTF-8 environment in Linux.