1

I wrote simple function that read whole file into a buffer.

#include <iostream> #include <fstream> int main() { std::ios_base::sync_with_stdio(0); std::ifstream t; t.open("C:\\Users\\sufal\\Desktop\\test.txt"); t.seekg(0, std::ios::end); long length = t.tellg(); t.seekg(0, std::ios::beg); std::cout << "file size: " << length << std::endl; char* buffer = new char[length+1]; t.read(buffer, length); t.close(); buffer[length] = 0; std::cout << buffer << std::endl; return 0; } 

And this is test.txt:

1 2 3 

The output that the program produces looks like this: enter image description here

The file size should be 5 bytes. Why my program shows wrong file size? Windows Explorer also seems to show wrong file size of 7 bytes.

1
  • 1
    This doesn’t address the question l but get in the habit of initializing objects with meaningful values rather than default initializing them and immediately overwriting the default values. In this case, that means changing std::ifstream t; t.open("C:\\Users\\sufal\\Desktop\\test.txt”); to std::ifstream t("C:\\Users\\sufal\\Desktop\\test.txt");. Also, you don’t have to call t.close();. The destructor will do that. Commented Nov 29, 2020 at 23:05

3 Answers 3

4

On Windows the newline character is "\r\n", which consists of two bytes. So, if your file does not end with a newline, 7 is indeed its size:

1 <-- 1 byte for '1', 2 bytes for CRLF 2 <-- 1 byte for '2', 2 bytes for CRLF 3 <-- 1 byte for '3' 

To read the file correctly on a byte level you need to open it in binary mode:

t.open("C:\\Users\\sufal\\Desktop\\test.txt", ios_base::binary); 

(you can read about the details of this behavior in the documentation).

You can also see other options to read the whole file into a string in C++:

Sign up to request clarification or add additional context in comments.

2 Comments

So binary mode is also applicable for reading text files?
@olaf No, but your code is written in a way that assumes reading a binary file - byte by byte. Without this flag, ifstream interprets the newline characters and modifies them, hence your artefacts. See the linked questions and their answers for ways to read the file taking advantage of it being a text file.
2

Your file is 7 bytes in size, because it uses CRLF line breaks.

1[cr][lf] 2[cr][lf] 3 

But, you are opening the file in text mode, which on Windows will normalize CRLF line breaks to LF. You are allocating 7 chars for your buffer, but read() is outputting only 5 chars:

1[lf] 2[lf] 3 

That is why you see the extra 2 = on the end of the print output, because you didn’t zero out the unused buffer space, so you are seeing random garbage from uninitialized memory.

To do what you are attempting, open the file in binary mode instead.

t.open("C:\\Users\\sufal\\Desktop\\test.txt", std::ios_base::binary); 

See Binary and text modes on cppreference.com for more details.

Comments

1

On Windows this file is indeed 7 bytes: 1 \r\n 2 \r\n 3

Windows encodes new line in two bytes - CR + LF (or \r + \n in other notation).

All is correct.

8 Comments

So if I just want to read whole file how I should handle this double EOL characters
You will read the file just fine. You can easily assume that all \r characters end the line and skip next byte. It's ALWAYS \r\n on Windows and \r isn't used anywhere else (basically).
\r = 13 in decimal, \n = 10 decimal
So that is the reason of this two equals signs at the end of output?
@loa_in_ + 1 is necessary to store the 0 at the end of the buffer, to have a correct null-terminated string, for cout to print it correctly.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.