Below is a simplified example of my problem. I have some external byte data which appears to be a string with cp1252 encoded degree symbol 0xb0. When it is stored in my program as an std::string it is correctly represented as 0xffffffb0. However, when that string is then written to a file, the resulting file is only one byte long with just 0xb0. How do I write the string to the file? How does the concept of UTF-8 come into this?
#include <iostream> #include <fstream> typedef struct { char n[40]; } mystruct; static void dump(const std::string& name) { std::cout << "It is '" << name << "'" << std::endl; const char *p = name.data(); for (size_t i=0; i<name.size(); i++) { printf("0x%02x ", p[i]); } std::cout << std::endl; } int main() { const unsigned char raw_bytes[] = { 0xb0, 0x00}; mystruct foo; foo = *(mystruct *)raw_bytes; std::string name = std::string(foo.n); dump(name); std::ofstream my_out("/tmp/out.bin", std::ios::out | std::ios::binary); my_out << name; my_out.close(); return 0; } Running the above program produces the following on STDOUT
It is '�' 0xffffffb0
std::string name = std::string(foo.n);-- This does not construct a string containing two characters.typedefforstruct.*(mystruct *)raw_bytesis not legal, anything could happen. 2.0xffffffb0is0xb0charvalue cast toint. It has nothing to do with ASCII, cp1252 or anything of this nature.