5

According to GCC's documentation and the various answers I read on Stack Overflow, it is allowed to use unions for type punning in C, like:

union a_union { int i; double d; }; int f() { union a_union t; t.d = 3.0; return t.i; } 

But is it allowed to use unions to modify parts of an object? like:

#include <stdint.h> #include <assert.h> union test { uint32_t val; uint16_t part[2]; }; int main(void) { union test t; t.val = 0x12345678; t.part[0] = 0x4321; assert(t.val == 0x12344321); return 0; } 

Assuming our machine is little-endian, will the assert always succeed?

C standard says:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

But I do see similar usage in many places, such as the Linux kernel.

According to this answer, since sizeof(uint16_t [2]) == sizeof(uint32_t), this does not violate the rule above.

So, I modifiy my question:

#include <stdint.h> #include <assert.h> union test { uint32_t val; uint8_t bit_0; struct { uint8_t _bit_0; uint8_t bit_8; }; }; int main(void) { union test t; t.val = 0x12345678; t.bit_0 = 0x21; t.bit_8 = 0x43; assert(t.val == 0x12344321); return 0; } 

Is it still valid in this situation?

4
  • Observe that if writing one element of an array were considered to be a store solely to that element, not the whole array, then every other element of the array could change. This would completely break using more than one element in any array in any union. So it is not a reasonable interpretation of the C standard. Commented Aug 15 at 23:56
  • Allowed is, but that is a very bad practice that is also subject to alignment issues and to architecture nontransparent dependencies. Commented Aug 16 at 15:13
  • @LuisColorado: There are no alignment issues. A union is properly aligned for all of its members. Commented Aug 18 at 10:08
  • @EricPostpischil, There can be, if the fields of the union are, themselves, structures. using unions for type reinterpretation is architecture dependent and so, not portable coding style. Of course union is used for something in the language, but this is not the best use of it (probably the worst, instead) You can find that the field you try to access is not put at the proper offset due to gaps introduced by the compiler. Reinterpreting binary data stored in memory is a dangerous bend path. Unfortunately too spread to make people think it is the only way to proceed. Commented Aug 19 at 7:53

3 Answers 3

9

Going by a strict interpretation of 6.2.6.1p7:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

It seems clear that there's no guarantee that bytes of the larger member that are not part of the smaller member won't be modified if the smaller member is updated.

Certain compilers may guarantee this with a particular set of options, but the C standard does not.

You mentioned the Linux kernel which uses features of GCC to guarantee particular behaviors, although I was unable to find anything in the GCC documentation that provides this guarantee.

Sign up to request clarification or add additional context in comments.

3 Comments

The Linux kernel will be compiled with special compiler options, and limited to the types of hardware supported by Linux. It doesn't say that much about the language in general.
I saw an LKML thread where Linus [Torvalds] said gcc did document it. At the time, IIRC, I looked for and found a gcc doc that did say so, but I've forgotten which one. But, see discussion under 0___________'s answer. Or, more directly, C23, 6.5.2.3, footnote 107. You'll be surprised (as I was) that union type punning is part of the standard.
@CraigEstey C89 (which Linux was originally written in) had it as implementation defined (C90 6.3.2.3) "With one exception, if a member of a union object is accessed after a value has been stored in a different member of the object. the behavior is implementation-defined." And then followed the "common initial sequence" rule. This was relaxed in C99 and onward.
3

Union type-punning is OK so long as you don’t read a member in a way that would produce a trap representation for that member’s type. (Unsigned integer types don’t have trap representations; some signed/FP types can.)

In pre-C23, reading a different union member than the one last written is implementation-defined (GCC/Clang document it as supported). In C23 it’s explicitly allowed to access a union object through any of its members.

The rule you quoted (“bytes … that do not correspond to that member … take unspecified values”) doesn’t hurt your second example, because both members cover the same 4 bytes. When you store to t.part[0], you’re still writing within the union member part (the whole array), so there are no “other” bytes of the union that don’t correspond to that member.

So both of your examples are OK.

Assuming our machine is little-endian, will the assert always succeed?

Yes

Union punning is in common use. It’s a well-worn idiom in systems code (e.g., the Linux kernel, embedded firmware, protocol parsers etc)

Your last example:

 union test { uint32_t val; uint8_t bit_0; struct { uint8_t _bit_0; uint8_t bit_8; }; }; int main(void) { union test t; t.val = 0x12345678; t.bit_0 = 0x21; t.bit_8 = 0x43; assert(t.val == 0x12344321); return 0; } 

What will happen on the compilers you actually use (GCC/Clang, little-endian):

  • t.bit_0 writes the lowest byte to 0x21.

  • The anonymous struct’s bit_8 writes the next byte to 0x43.

  • The 32-bit value reads back as 0x12 34 43 21 -> the assert passes.

Why this “works” in practice

  • Same storage is shared; both uint8_ts are contiguous (alignment 1 → no padding between them).

  • Endianness is little-endian, so byte 0 is the low 8 bits, byte 1 is the next.

  • GCC/Clang’s union semantics keep the other bytes intact.

Why it isn’t strictly portable

  • By the letter of the standard (pre-C23 and even in C23 regarding “unspecified bytes”), after t.bit_0 = 0x21; the other bytes of the union could be considered unspecified as standard does not guarantee that they will be intact; after t.bit_8 = 0x43;, the remaining two bytes still are. Reading t.val then has an unspecified result in pure ISO C terms, even though mainstream compilers do the intuitive thing.

It’s also endianness-dependent (big-endian would not match your expectation).

11 Comments

Thanks! So, what would happen if I changed uint16_t part[2]; to uint16_t part;, and writing t.part = 0x4321;?
Then the remaining bytes of the union would have unspecified values.
Where do versions of C before C 2024 say accessing a union member other than the last one written is implementation-defined, and where does C 2024 explicitly say it is allowed?
Dunno if this helps, but ... In a quick scan for "type punning" in open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf in 6.5.2.3 "Struct and union members", on page 75, there is a footnote: 107) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
@CraigEstey: I am well aware of the footnote. However, this answer asserts reading a different union member was implementation-defined in earlier versions of the standard and now is explicitly allowed. Neither is true to my knowledge, except to the extent the footnote specifies the behavior; the footnote has not changed. Since the footnote is non-normative, I would not say the standard explicitly says it is allowed.
Wow, I went all the way back to C11 and the footnote is there. So, not [merely] impl defined in older stds, but, explicitly allowed since [at least] C11. Non-normative [as in The core idea is that something is different from what is generally expected or considered standard]? It's been explicitly supported in gcc docs for a long time. I used it back in 1981. So, used in practice and the footnote language (in the standard) is quite clear. It is useful and well defined in C (type punning is much less of a sin in C than C++) if you know what you're doing with it.
It was always well-defined. However, some type conversions are implementation-defined, like unsigned integer to signed integer. There's no difference in C23 except mandatory two's complement so oddball signedness format need not be taken into account.
|
0

There are two reasons the "common initial sequence" guarantees say "inspect" rather than "modify":

  1. On some platforms, given something like:

     struct foo { int32_t a; int16_t b; } struct bar { int32_t a; int16_t b,c; } int get_foo_b(struct foo *p) { return p->b; } void set_foo_b(struct foo *p, int value) { p->b = value; } 

    updating b without disturbing what follows might require two 8-bit stores, but if the following content wouldn't matter a 16-bit store could do the job. The Standard didn't want to forbid such treatment.

  2. Under C89, something like:

     union u { struct foo f; struct bar b; }; int test(struct bar *bp1, struct bar *bp2) { if (get_foo_b(&((union u*)bp1->f)) bp2->b = 1; return get_foo_b(&((union u*)bp1->f)); } 

    would have been transitively equivalent to:

     int test(struct bar *bp1, struct bar *bp2) { if (get_foo_b((struct foo*)bp1) bp2->b = 1; return get_foo_b((struct foo*)bp1); } 

and although the former construct was rare, the latter construct was quite common. Given the nature of compiler optimizations when C89 was written, reliably supporting the latter construct (or the idiom exemplified thereby) in cases where the converted pointer will be used to read storage was trivial (treat a conversion from a struct foo into something else as a barrier to consolidating later accesses via struct foo with earlier ones), but properly handling writes may have required somewhat more complexity than the Committee was willing to mandate. I'm not sure if this was really a consideration, though, since the extra complexity wouldn't be huge, or if #1 was the bigger issue.

Note that because the Standard didn't mandate that compilers include the trivial amount of logic necessary to accommodate the constructs exemplified in #2, clang and gcc make no effort to support them without completely disabling type-based aliasing altogether.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.