2

I am trying to read chunks of data from a file directly into a struct but the padding is causing too much data to be read and the data to be misaligned.

Do I have to manually read each part into the struct or is there an easier way to do this?

My code:

The structs

typedef unsigned char byte; struct Header { char ID[10]; int version; }; struct Vertex //cannot rearrange the order of the members { byte flags; float vertex[3]; char bone; byte referenceCount; }; 

How I am reading in the data:

std::ifstream in(path.c_str(), std::ifstream::in | std::ifstream::binary); Header header; in.read((char*)&header.ID, sizeof(header.ID)); header.ID[9] = '\0'; in.read((char*)&header.version, sizeof(header.version)); std::cout << header.ID << " " << header.version << "\n"; in.read((char*)&NumVertices, sizeof(NumVertices)); std::cout << NumVertices << "\n"; std::vector<Vertex> Vertices(NumVertices); for(std::vector<Vertex>::iterator it = Vertices.begin(); it != Vertices.end(); ++it) { Vertex& v = (*it); in.read((char*)&v.flags, sizeof(v.flags)); in.read((char*)&v.vertex, sizeof(v.vertex)); in.read((char*)&v.bone, sizeof(v.bone)); in.read((char*)&v.referenceCount, sizeof(v.referenceCount)); } 

I tried doing: in.read((char*)&Vertices[0], sizeof(Vertices[0]) * NumVertices); but this produces incorrect results because of what I believe to be the padding.

Also: at the moment I am using C-style casts, what would be the correct C++ cast to use in this scenario or is a C-style cast okay?

2
  • Regarding the last part of your question, you could use reinterpret_cast<char *>, which makes it very explicit. Commented Apr 27, 2011 at 13:08
  • There's more to this than I first thought :P Commented Apr 27, 2011 at 13:30

6 Answers 6

3

If you're writing the entire structure out in binary, you don't need to read it as if you had stored each variable separately. You would just read in the size of the structure from file into the struct you have defined.

Header header; in.read((char*)&header, sizeof(Header)); 

If you're always running on the same architecture or the same machine, you won't need to worry about endian issues as you'll be writing them out the same way your application needs to read them in. If you are creating the file on one architecture and expect it to be portable/usable on another, then you will need to swap bytes accordingly. The way I have done this in the past is to create a swap method of my own. (for example Swap.h)

Swap.h - This is the header you use within you're code void swap(unsigned char *x, int size); ------------------ SwapIntel.cpp - This is what you would compile and link against when building for Intel void swap(unsigned char *x, int size) { return; // Do nothing assuming this is the format the file was written for Intel (little-endian) } ------------------ SwapSolaris.cpp - This is what you would compile and link against when building for Solaris void swap(unsigned char *x, int size) { // Byte swapping code here to switch from little-endian to big-endian as the file was written on Intel // and this file will be the implementation used within the Solaris build of your product return; } 
Sign up to request clarification or add additional context in comments.

4 Comments

When you write the structure to file, it will be written with all the data, including your null line endings and padding. Therefore, when you read it back in, everything is put back into place the way it is expected.
Oh I understand (I think). The files weren't necessarily written on the same machine that's reading them so the files could have a different endianness too and this is why there's alignment issues too?
Yes, you'll want to make sure you're not writing out the data the same way you're reading it in currently. (i.e. you shouldn't have code like write((char*)&header.ID, 10) ) You should be writing the structure as a whole as in write((char*)&header, sizeof(header)); then read it in as stated above.
Brilliant, I wasn't aware that these stem to the way the file is structured too. Thank you. I have a question about the endianness; Will the whole structure be in the file "backwards" or will the members still be in the same order but their data will be backwards?
2

No, you don't have to read each field separately. This is called alignment/packing. See http://en.wikipedia.org/wiki/Data_structure_alignment

C-style cast is equivalent to reinterpret_cast. In this case you use it correctly. You may use a C++-specific syntax, but it is a lot more typing.

3 Comments

"but it is a lot more typing." that's a very sad point, with today's tooling.
@jv42: Having two syntactically different options to describe the same expression I prefer the shortest one even if the editor auto-completes.
That's your choice and it's fine, I do use C-style casts a lot when coding in C++ myself. But the expressiveness of C++ style casts is much better, and conveys the intents without adding comments (ie am I breaking a const specifier, am I doing a type conversion or am I messing with pointers).
2

You can change padding by explicitly asking your compiler to align structs on 1 byte instead of 4 or whatever its default is. Depending on environment, this can be done in many different ways, sometimes file by file ('compilation unit') or even struct by struct (with pragmas and such) or only on the whole project.

Comments

2

header.ID[10] = '\0';

header.ID[9] is the last element of the array.

1 Comment

Oops! Been using Lua too much recently :P
1

If you are using a Microsoft compiler then explore the align pragma. There are also the alignment include files:

#include <pshpack1.h> // your code here #include <poppack.h> 

GNU gcc has a different system that allows you to add alignment/padding to the structure definition.

3 Comments

Microsoft compiler also supports pack pragmas. I prefer using them as code gets more portable / compiler independent. See msdn.microsoft.com/en-us/library/ms253935.aspx
When compiled if the program is ran on a different machine will the structs still have the padding given from either method here? (I feel this might be a dumb question)
@Rarge - yes, padding affects the binary image, so is fixed at compile/link time.
0

If you are reading and writing this file yourself, try Google Protobuf library. It will handle all byteorder, alignment, padding and language interop issues.

http://code.google.com/p/protobuf/

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.