0

I'm new to C++ file io, so the other day I decided to write a small program that simply reads a UTF-8 encoded string and a paired float from a binary file. The pattern is string-float with no extra data or spacing between pairs. EDIT I've revised the code based on several answers. However, the output remains the same ("The RoommateAp 0");

string readString (ifstream* file) { //Get the length of the upcoming string uint16_t stringSize = 0; file->read(reinterpret_cast<char*>(&stringSize), sizeof(char) * 2); //Now that we know how long buffer should be, initialize it char* buffer = new char[stringSize + 1]; buffer[stringSize] = '\0'; //Read in a number of chars equal to stringSize file->read(buffer, stringSize); //Build a string out of the data string result = buffer; delete[] buffer; return result; } float readFloat (ifstream* file) { float buffer = 0; file->read(reinterpret_cast<char*>(&buffer), sizeof(float)); return buffer; } int main() { //Create new file that's open for reading ifstream file("movies.dat", ios::in|ios::binary); //Make sure the file is open before starting to read if (file.is_open()) { while (!file.eof()) { cout << readString(&file) << endl; cout << readFloat(&file) << endl; } file.close(); } else { cout << "Unable to open file" << endl; } } 

And a sample of data from the file (spaces for readability):

000C 54686520526F6F6D6D617465 41700000 

As one can see, the first two bytes are the length of the string (12 in this case), followed by twelve characters (which spell "The Roommate"), and the final four bytes are a float.

When I run this code, the only thing that happens is that the terminal hangs and I have to close it manually. I think it may be because I am reading past the end of the file, but I have no idea why this would happen. What am I doing incorrectly?

4
  • Is that the data as would be seen in a hex editor? Or a text editor? Commented Feb 13, 2011 at 21:14
  • That is the data as would be seen in a hex editor. Commented Feb 13, 2011 at 21:15
  • I hope you're on a big-endian machine. Because if you're on a little-endian machine, that's not 12, it's 3072. Commented Feb 13, 2011 at 21:45
  • I'm on an Intel machine, so little-endian. However, printing out the value of stringSize verifies that it's value is 12. Commented Feb 13, 2011 at 21:51

4 Answers 4

5

There are at least two issues. First, the line:

file->read(reinterpret_cast<char*>(stringSize), sizeof(char) * 2); 

Probably should take the address of stringSize:

file->read(reinterpret_cast<char*>(&stringSize), sizeof(stringSize)); 

Second, the line:

char* buffer = new char[stringSize]; 

Doesn't allocate enough memory, since it doesn't take the NUL terminator into account. That code should do something like:

//Now that we know how long buffer should be, initialize it char* buffer = new char[stringSize + 1]; //Read in a number of chars equal to stringSize file->read(buffer, stringSize); buffer[stringSize] = '\0'; 

Finally, the line:

return static_cast<string>(buffer); 

Fails to delete[] the buffer after instantiating a string from it, which will cause a memory leak.

Also note that std::string's UTF-8 support is quite poor out of the box. Fortunately, there are solutions.

Sign up to request clarification or add additional context in comments.

7 Comments

That was a silly mistake. I was changing code around trying different things, and I accidentally left out the &. Output is now The RoommateAp 0 (with a newline between RoommateAp and 0.
Damn - I missed the static_cast. May be worth mentioning that the std::string class has a method which gives the pointer to the buffer (similar to c_str but without a separate buffer or null terminating - I forget the spelling). Also, the resize method is important in this context, to ensure that both the string size and its underlying buffer are big enough.
You can static_cast a char* to a string, because std::string has a constructor that takes a char*.(but he does still need to delete the pointer)
Buffer is actually an array of chars, and isn't that what a string is as well?
@PigBen, you're absolutely right, apparently that even works if the constructor is explicit. Answer updated accordingly, thanks :)
|
2

Your code has a few serious problems:

  1. file->read(reinterpret_cast<char*>(stringSize), sizeof(char) * 2);

    in this part you are casting the current value of stringSize into a pointer. Your idea was most probably instead to pass the address of the stringSize variable.

  2. char* buffer = new char[stringSize];

    This allocates an array of chars, but no one is going to free it. Use std::vector<char> buffer(stringSize); instead so that the memory management will be done correctly for you. To get the address of the buffer you can use &buffer[0].

  3. return static_cast<string>(buffer);

    What you probably want is the string constructor that accepts pointers to first and one-past-last characters. In other words: return std::string(&buffer[0], &buffer[0]+stringSize);

Comments

0

You cast to a character array. Does the string size number in the file include space for the null character? If not, you might be running off the end of a character array and looping forever on the cout write.

2 Comments

Possibly... however, looking at the data, however, it doesn't appear that there are any null characters at the end of the strings. If you look at the first post, the string length is specified as 000C (12), and the string data is 12 bytes long. It's readable representation ("The Roommate") is 12 characters long, so I don't think there's any nulls there.
You read a size from the file and then populate a buffer of that size. Then you cast the buffer to a string and return it. It's picked up by the stream operator at that point; if the buffer didn't get populated with a null character, the stream operator might not know when to stop.
0

There are some issues with your code.

You state that the size of the string is specified by 4 bytes, that means you should use a uint32_t not a uint16_t.

You do not free the memory used when you allocate the string buffer.

string readString(std::ifstream* file) { // Get the length of the upcoming string. // The length of the string is specified with 4 bytes: use uint32_t, not uint16_t uint32_t stringSize = 0; file->read(reinterpret_cast<char*>(&stringSize), sizeof(uint32_t)); // Now that we know how long buffer should be, initialize it char* buffer = new char[stringSize + 1]; buffer[stringSize] = '\0'; // null terminate the string //Read in a number of chars equal to stringSize file->read(buffer, stringSize); string result = buffer; delete[] buffer; return result; } 

1 Comment

Woops, that's a mistake on my part. I meant to say the first two bytes. Also, I left out delete[] for the sake of brevity.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.