0

I am currently learning how to work with UTF-XX encoded files and text.

I have this simple example:

std::ifstream ifs; ifs.open("data/text.txt"); do { char c; ifs.get(c); printf("%x\n", c); } while (!ifs.eof()); 

Where the file text.txt contains the following strings:

yabloko яблоко 

The results looks like this:

79 61 62 6c 6f 6b 6f a ffffffd1 ffffff8f ffffffd0 ffffffb1 ffffffd0 ffffffbb ffffffd0 ffffffbe ffffffd0 ffffffba ffffffd0 ffffffbe 

I do understand why I have twice the number of lines for the cyrillic word (because it's UTF-8 encoded and that each character in this case is using 2 bytes), my questions is about what get() and printf() are doing. More precisely why is my character c printed out as a int? with the first 3 bytes set to FFF? When I look at the doc for the get() method I see:

int get(); istream& get (char& c); 

I tried both option. I see the first one returns an int. The second takes a char? I am really confused? Why would these functions extracts anything else from a file than just a single byte (char) at a time and why is the value for the cyrillic characters printed out as for example ffffffd1 instead of d1?

1
  • Not related to your issue, but your do/while loop is not validating get() succeeds before calling printf(), so c will be garbage when EOF is reached. You should use a while loop instead: char c; while (ifs.get(c)) { printf("%hhx\n", c); } Commented Mar 8, 2017 at 22:48

1 Answer 1

3

More precisely why is my character c printed out as a int?

Because char is promoted to int when passed to ... argument of printf. On your platform char is signed, hence all codes above 127 get promoted to a negative int.

You may like to use %hhx format specifier to print char.


int istream::get() returns an int rather than char to be able to distinguish the read character from EOF. Traits::eof() is normally int(-1). No Unicode character has this code.

Sign up to request clarification or add additional context in comments.

3 Comments

Nice, thank you I am tempted to accept your answer) but do you ask know why the get() method returns an int rather than a char?
@user18490 Added a note about int get() for you.
Makes sense) Thx a lot -

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.