trouble reading binary data

Question

The reader and writer

#include<string> #include<fstream> #include<memory> class BinarySearchFile{ BinarySearchFile::BinarySearchFile(std::string file_name){ // concatenate extension to fileName file_name += ".dat"; // form complete table data filename data_file_name = file_name; // create or reopen table data file for reading and writing binary_search_file.open(data_file_name, std::ios::binary); // create file if(!binary_search_file.is_open()){ binary_search_file.clear(); binary_search_file.open(data_file_name, std::ios::out | std::ios::binary); binary_search_file.close(); binary_search_file.open(data_file_name), std::ios::out | std::ios::in | std::ios::binary | std::ios::ate; } std::fstream binary_search_file; void BinarySearchFile::writeT(std::string attribute){ if(binary_search_file){ binary_search_file.write(reinterpret_cast<char *>(&attribute), attribute.length() * 2); } } std::string BinarySearchFile::readT(long filePointerLocation, long sizeOfData) { if(binary_search_file){ std::string data; data.resize(sizeOfData); binary_search_file.seekp(filePointerLocation); binary_search_file.seekg(filePointerLocation); binary_search_file.read(&data[0], sizeOfData); return data; } };

The reader call

while (true){ std::unique_ptr<BinarySearchFile> data_file(new BinarySearchFile("classroom.dat")); std::string attribute_value = data_file->read_data(0, 20); }

The writer call

 data_file->write_data("packard ");

The writer writes a total of 50 bytes

"packard 101 500 "

The reader is to read the first 20 bytes and the result is "X packard X" where X represents some malformed bytes of data. Why is the data read back in x-number of bytes corrupt?

A file is a stream of bytes. If you want to write to a file, you need a stream of bytes to write to that file that follows whatever file format you want. Do you have a file format? Do you create a stream of bytes in that format? You're expecting this to work by magic. — David Schwartz
– David Schwartz, Commented Apr 16, 2013 at 15:37
Do you have a file format? Binary! Do you create a stream of bytes in that format? I believe I do but apparently incorrectly. — Mushy
– Mushy, Commented Apr 16, 2013 at 15:41
If you have a file format, what is the meaning of the first byte? And where is the code that puts that specific information into the first byte of the data you write to the file? — David Schwartz
– David Schwartz, Commented Apr 16, 2013 at 15:41
@Mushy Binary is not a file format. It's simply a rough indication that the format you're using isn't restricted to printable characters. — James Kanze
– James Kanze, Commented Apr 16, 2013 at 15:43
Yes, I have a file format that uses char as a two-byte type which would make writing "packard " 20 bytes. I write that 20 bytes using std::fstream::write() and subsequently read those 20 bytes using std::fstream::read(). — Mushy
– Mushy, Commented Apr 16, 2013 at 15:56

James Kanze · Accepted Answer · 2013-04-16 15:34:08Z

2

You can't simply write data out by casting it's address to a char* and hoping to get anything useful. You have to define the binary format you want to use, and implement it. In the case of std::string, this may mean outputing the length in some format, then the actual data. Or in the case where fixed length fields are needed, forcing the string (or a copy of the string) to that length using std::string::resize, then outputting that, using std::string::data() to get your char const*.

Reading will, of course, be similar. You'll read the data into a std::vector<char> (or for fixed length fields, a char[]), and parse it.

answered Apr 16, 2013 at 15:34

James Kanze

155k20 gold badges191 silver badges338 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mushy Over a year ago

Yes, thank you. I modified the writer as follows: attribute.resize(attribute.length() * 2); const char *write_this = attribute.data(); binary_search_file.write(write_this, attribute.length()); and the reader as follows: char data[20]; binary_search_file.read(data, sizeOfData); and I get what I desire but need to trim it so the actual data is correct

James Kanze Over a year ago

@Mushy That should almost work. I don't think that the resize of length * 2 does what you seem to want, however; it just adds attribute.length() bytes with '\0' to the end of the string. Why do you want 2 bytes for each character, and what does the second byte represent. If you want UTF-16, and the input string is UTF-8, you'll need explicit transcoding, and the final length will depend on the contents of your string. (And of course, everyone else does the opposite: UTF-16 or UTF-32 internally, and UTF-8 in files and on the network.)

Mushy Over a year ago

I want a two-byte char because I am converting a Java program where two-byte char is used to c++ where char is one byte. To maintain ordered format in the conversion and make verification easier, I am choosing to use a two-byte char. If I am not representing a two-byte char properly, open to doing it correctly through transcoding or conversion if necessary to maintain my desired format.

James Kanze Over a year ago

@Mushy OK. Java's external format is UTF-16BE. If your encoding is ISO 8859-1, or pure ASCII, then you can simply set the top byte to 0; otherwise, you'll have to use a more classical technique for transcoding. There are many ways of doing this, but the simplest would be to create an std::vector<char>, then loop over the input, inserting first '0', then the character into the vector, and finally writing v.data() (if you have C++11) or &v[0] to the output. (Or you can write to the output directly: dest.put() for each byte.)

alexrider · Accepted Answer · 2013-04-16 15:37:06Z

0

binary_search_file.write(reinterpret_cast<char *>(&attribute), attribute.length() * 2);
It is incorrect to cast std::string to char* if you need char* you must use attribute.c_str().
std::string apart from string pointer contains other data members, for example, allocator, your code will write all of that data to file. Also I don't see any reason to multiply string length by 2. +1 makes sense if you want to output terminating zero.

edited Apr 16, 2013 at 15:37

answered Apr 16, 2013 at 15:32

alexrider

4,4631 gold badge19 silver badges27 bronze badges

11 Comments

James Kanze Over a year ago

Any time you need a reinterpret_cast, unless you're doing really low level work (e.g. like implementing malloc), you should be suspicious.

alexrider Over a year ago

@JamesKanze in case of c_str() there will be no need in reinterpret cast, since there will be char* on hand. Or did I miss something?

James Kanze Over a year ago

The case of c_str() is a case where it is broken without needing a reinterpret_cast:-). You need some way in the file to recover the length.

alexrider Over a year ago

@JamesKanze Won't terminal zero be enough?

James Kanze Over a year ago

It might, if you actually write it. It depends on the format, and how you read it.

|

Collectives™ on Stack Overflow

trouble reading binary data

2 Answers 2

4 Comments

11 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

11 Comments

Linked

Related