72

If I have a struct in C++, is there no way to safely read/write it to a file that is cross-platform/compiler compatible?

Because if I understand correctly, every compiler 'pads' differently based on the target platform.

3
  • 5
    The efficiency (performance) gained by performing binary I/O often does not justify the money spent in research, design, development and especially debugging and maintenance. Source code should be simple to understand, but no simpler. Commented Mar 22, 2011 at 22:24
  • @ThomasMatthews I don't want to sound like a troll but at every job in my entire career it has mattered. Rather than focus on frequency I'd focus on application. You have a point but sweep under the rug a massive counter example. Trillions are made annually from the gains of binary I/O (Financial Markets/HFT, Streaming Media, HPC, Military Applications). For average person/company's website/application yes, you'd most likely be correct. But when latency is a profit bottleneck/driver you couldn't be more wrong. Code also needs to meet its requirements which can include performance. Commented Aug 14, 2024 at 19:41
  • 1
    The application / program should be profiled to determine where the bottlenecks are. Often, file I/O is one major source of execution time (waiting for I/O to complete). Unless there is a tight specification for the format of the binary file, I would use human readable formats. Human readable formats reduce the duration of maintenance and debugging. At my shop, we transmit data out of the device in a binary format. We had to create a PC application to annotate the binary file and edit it. Likewise, we have a "compiler" to convert human readable to binary. Commented Aug 14, 2024 at 19:59

4 Answers 4

61

No. That is not possible. It's because of lack of standardization of C++ at the binary level.

Don Box writes (quoting from his book Essential COM, chapter COM As A Better C++)

C++ and Portability


Once the decision is made to distribute a C++ class as a DLL, one is faced with one of the fundamental weaknesses of C++, that is, lack of standardization at the binary level. Although the ISO/ANSI C++ Draft Working Paper attempts to codify which programs will compile and what the semantic effects of running them will be, it makes no attempt to standardize the binary runtime model of C++. The first time this problem will become evident is when a client tries to link against the FastString DLL's import library from a C++ developement environment other than the one used to build the FastString DLL.

Struct padding is done differently by different compilers. Even if you use the same compiler, the packing alignment for structs can be different based on what pragma pack you're using.

Not only that if you write two structs whose members are exactly same, the only difference is that the order in which they're declared is different, then the size of each struct can be (and often is) different.

For example, see this,

struct A { char c; char d; int i; }; struct B { char c; int i; char d; }; int main() { cout << sizeof(A) << endl; cout << sizeof(B) << endl; } 

Compile it with gcc-4.3.4, and you get this output:

8 12 

That is, sizes are different even though both structs have the same members!

The bottom line is that the standard doesn't talk about how padding should be done, and so the compilers are free to make any decision and you cannot assume all compilers make the same decision.

Sign up to request clarification or add additional context in comments.

4 Comments

There is __attribute__((packed)) which I use for shared-memory structures as well as ones used to map network data. It does affect performance (see digitalvampire.org/blog/index.php/2006/07/31/… ) but it's a useful feature for network-related structs. (It's not a standard as far as I know, so the answer is still true).
I don't understand why struct A size is 8 and not more. { char c; // what about this? char d; // size 1 + padding of 3 int i; // size 4 };
@Dchris - the compiler is probably being careful to ensure that each field is aligned based on its own natural alignment. c and d are one byte and thus aligned no matter where you put them for the single-byte CPU instructions. The int however needs to be aligned on a 4-byte boundary, which to get there requires two bytes of padding after d. This gets you to 8.
It seems like most compilers would align members in the same way. Are there really compilers out there that would put padding between A::c and A::d? If there aren't, then am I correct in saying that the problem is only that the standard doesn't make an guarantees even though every compiler seems to be doing the same thing (much like a reinterpret_cast).
32

If you have the opportunity to design the struct yourself, it should be possible. The basic idea is that you should design it so that there would be no need to insert pad bytes into it. the second trick is that you must handle differences in endianess.

I'll describe how to construct the struct using scalars, but the you should be able to use nested structs, as long as you would apply the same design for each included struct.

First, a basic fact in C and C++ is that the alignment of a type can not exceed the size of the type. If it would, then it would not be possible to allocate memory using malloc(N*sizeof(the_type)).

Layout the struct, starting with the largest types.

 struct { uint64_t alpha; uint32_t beta; uint32_t gamma; uint8_t delta; 

Next, pad out the struct manually, so that in the end you will match up the largest type:

 uint8_t pad8[3]; // Match uint32_t uint32_t pad32; // Even number of uint32_t } 

Next step is to decide if the struct should be stored in little or big endian format. The best way is to "swap" all the element in situ before writing or after reading the struct, if the storage format does not match the endianess of the host system.

11 Comments

This sounds interesting. But can you get more in Detail: Why do you order it by type length descending and why did you pad it that you have an even number of uint32_t?
@Phil, A basic type, like uint32_t, can (potentially) have an alignment requirement that match its size, in this case four bytes. A compiler may insert padding to achieve this. By doing this manually, there will be no need for the compiler to do this, as the alignment always will be correct. The drawback is that on systems with less strict alignment requirements, a manually padded struct will be larger than one padded by the compiler. You can do this in ascending or descending order, but you will need to insert more pads in the middle of the struct if you do int in ascending order...
... Padding in the end of the struct is only needed if you plan to use it in arrays.
@jwg. In the general case (like, when you use a struct someone else has designed), padding can be inserted to ensure that no field end up on a location the hardware can't read (as explained in the other answers). However, when you design the struct yourself, you can, with some care, ensure that no padding is needed. These two facts do not, in any way, oppose each other! I believe that this heuristic will hold for all possible architectures (given that a type to doesn't have an alignment requirement which is greater than it's size, which isn't legal in C anyway).
@Lindydancer - padding is needed if you intend to composite them into a contiguous memory block of random stuff, not necessarily just a homogenous array. Padding can make you self-aligning on arbitrary boundaries such as sizeof(void*) or the size of an SIMD register,.
|
11

No, there's no safe way. In addition to padding, you have to deal with different byte ordering, and different sizes of builtin types.

You need to define a file format, and convert your struct to and from that format. Serialization libraries (e.g. boost::serialization, or google's protocolbuffers) can help with this.

2 Comments

"The size of a structure (or class) may not be equal to the sum of the size of its members."
@Thomas: Exactly. And that's just the start of the fun.
3

Long story short, no. There is no platform-independent, Standard-conformant way to deal with padding.

Padding is called "alignment" in the Standard, and it begins discussing it in 3.9/5:

Object types have alignment requirements (3.9.1, 3.9.2). The alignment of a complete object type is an implementation-defined integer value representing a number of bytes; an object is allocated at an address that meets the alignment requirements of its object type.

But it goes on from there and winds off to many dark corners of the Standard. Alignment is "implementation-defined" meaning it can be different across different compilers, or even across address models (ie 32-bit/64-bit) under the same compiler.

Unless you have truly harsh performance requirements, you might consider storing your data to disc in a different format, like char strings. Many high-performance protocols send everything using strings when the natural format might be something else. For example, a low-latency exchange feed I recently worked on sends dates as strings formatted like this: "20110321" and times are sent similarly: "141055.200". Even though this exchange feed sends 5 million messages per second all day long, they still use strings for everything because that way they can avoid endian-ness and other issues.

4 Comments

So many things wrong here. Everything you claim about exchange feeds is wrong. Name a non-crypto ACTUAL Financial Exchange that uses a non binary Feed Format? Not a silly retail client api. What exchange are you talking about? NYSE, ARCA, BATS, IEX, NASDAQ, etc, they ALL USE Binary Feeds. Sending a date as 20110321 doesn't make it nonbinary or a string, it makes it an integer. I don't know ANY exchange that sends decimals as you displayed it every exchange I've encountered uses implied decimal placement via integers. High performance network protocols use Binary I/O.
Even funnier is you have a github repo of markets specs and just about everyone is a binary feed.... really man? You're in Chicago... What does the CME use? FIX or BINARY, take a guess which one has more volume per day? (Hint it isn't close ) And FIX is barely non binary.
Uh, what's your problem? You're spewing vitriol about a post from 2011 - 13 years ago. It doesn't even matter if I'm wrong in my claims or not (BTW, I'm not wrong, what I said was true in 2011), why are you just so fly-of-the-handle mad? And mad enough to go dig up my aincent repos which I haven't touched in years just to find something I'm wrong about? Chill out, dude.
By the way, since you're rooting around my github, check out the QRTMD protocol (TMX Quantun). It sends timestamps as a 20 Byte character string.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.