Array of non-contiguous objects

Question

#include <iostream> #include <cstring> // This struct is not guaranteed to occupy contiguous storage // in the sense of the C++ Object model (§1.8.5): struct separated { int i; separated(int a, int b){i=a; i2=b;} ~separated(){i=i2=-1;} // nontrivial destructor --> not trivially copyable private: int i2; // different access control --> not standard layout }; int main() { static_assert(not std::is_standard_layout<separated>::value,"sl"); static_assert(not std::is_trivial<separated>::value,"tr"); separated a[2]={{1,2},{3,4}}; std::memset(&a[0],0,sizeof(a[0])); std::cout<<a[1].i; // No guarantee that the previous line outputs 3. } // compiled with Debian clang version 3.5.0-10, C++14-standard // (outputs 3)

What is the rationale behind weakening standard guarantees to the point that this program may show undefined behaviour?
The standard says: "An object of array type contains a contiguously allocated non-empty set of N subobjects of type T." [dcl.array] §8.3.4. If objects of type T do not occupy contiguous storage, how can an array of such objects do?

edit: removed possibly distracting explanatory text

What do you mean the object does not occupy contiguous storage? Are you talking about the padding that could be in between the member variables? — NathanOliver
– NathanOliver, Commented Sep 30, 2016 at 12:45
For your first question: Because no one wants to design C++ around C stuff like memset. C structs need to work with memset for compatibility, the rest does not really matter. — Baum mit Augen
– Baum mit Augen ♦, Commented Sep 30, 2016 at 12:51
Where is this from? Have you run it and not gotten 3? There is a comment that says "No guarantee that ..." but I don't know who is asserting that. — Kenny Ostrom
– Kenny Ostrom, Commented Sep 30, 2016 at 13:26
@JoachimPileborg the standard permits parts of the storage required to implement object to be in completely separate memory regions (e.g. vtables) — M.M
– M.M, Commented Oct 2, 2016 at 12:02
There are many good reasons beside object non-contiguity why memsetting a "complex" object should be UB. — n. m. could be an AI
– n. m. could be an AI, Commented Oct 3, 2016 at 22:01

Community · Accepted Answer · 2017-05-23 12:16:31Z

1. This is an instance of Occam's razor as adopted by the dragons that actually write compilers: Do not give more guarantees than needed to solve the problem, because otherwise your workload will double without compensation. Sophisticated classes adapted to fancy hardware or to historic hardware were part of the problem. (hinting by BaummitAugen and M.M)

2. (contiguous=sharing a common border, next or together in sequence)

First, it is not that objects of type T either always or never occupy contiguous storage. There may be different memory layouts for the same type within a single binary.

[class.derived] §10 (8): A base class subobject might have a layout different from ...

This would be enough to lean back and be satisfied that what is happening on our computers does not contradict the standard. But let's amend the question. A better question would be:

Does the standard permit arrays of objects that do not occupy contiguous storage individually, while at the same time every two successive subobjects share a common border?

If so, this would influence heavily how char* arithmetic relates to T* arithmetic.

Depending on whether you understand the OP standard quote meaning that only the subobjects share a common border, or that also within each subobject, the bytes share a common border, you may arrive at different conclusions.

Assuming the first, you find that 'contiguously allocated' or 'stored contiguously' may simply mean &a[n]==&a[0] + n (§23.3.2.1), which is a statement about subobject addresses that would not imply that the array resides within a single sequence of contiguous bytes.

If you assume the stronger version, you may arrive at the 'element offset==sizeof(T)' conclusion brought forward in T* versus char* pointer arithmetic That would also imply that one could force otherwise possibly non-contiguous objects into a contiguous layout by declaring them T t[1]; instead of T t;

Now how to resolve this mess? There is a fundamentally ambiguous definition of the sizeof() operator in the standard that seems to be a relict of the time when, at least per architecture, type roughly equaled layout, which is not the case any more. (How does placement new know which layout to create?)

When applied to a class, the result [of sizeof()] is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. [expr.sizeof] §5.3.3 (2)

But wait, the amount of required padding depends on the layout, and a single type may have more than one layout. So we're bound to add a grain of salt and take the minimum over all possible layouts, or do something equally arbitrary.

Finally, the array definition would benefit from a disambiguation in terms of char* arithmetic, in case this is the intended meaning. Otherwise, the answer to question 1 applies accordingly.

A few remarks related to now deleted answers and comments: As is discussed in Can technically objects occupy non-contiguous bytes of storage?, non-contiguous objects actually exist. Furthermore, memseting a subobject naively may invalidate unrelated subobjects of the containing object, even for perfectly contiguous, trivially copyable objects:

#include <iostream> #include <cstring> struct A { private: int a; public: short i; }; struct B : A { short i; }; int main() { static_assert(std::is_trivial<A>::value , "A not trivial."); static_assert(not std::is_standard_layout<A>::value , "sl."); static_assert(std::is_trivial<B>::value , "B not trivial."); B object; object.i=1; std::cout<< object.B::i; std::memset((void*)&(A&)object ,0,sizeof(A)); std::cout<<object.B::i; } // outputs 10 with g++/clang++, c++11, Debian 8, amd64

Therefore, it is conceivable that the memset in the question post might zero a[1].i, such that the program would output 0 instead of 3.

There are few occasions where one would use memset-like functions with C++-objects at all. (Normally, destructors of subobjects will fail blatantly if you do that.) But sometimes one wishes to scrub the contents of an 'almost-POD'-class in its destructor, and this might be the exception.

Since one can place an object in a suitably aligned character array of an appropriate size, it seems that yes, one at least "could force otherwise possibly non-contiguous objects into a contiguous layout", regardless of one's interpretation of pointer arithmetic.
Further, since one can manually call a destructor and then forcibly place a new object in the now-empty storage location, it seems that an implementation has no choice but use the same contiguous layout for all most-derived objects of the same type.
@n.m. I guess you mean placement-new, but the "non-contiguous" layout remains, there may be parts of the object that are not placed within the buffer. A vtable is a common example of this.
How can a single type have multiple layouts (on one compiler+OS+architecture)?
@M.M A vtable is not a part of an object by any stretch of imagination. It typically exists before the object is created and after it is destroyed, and is shared between many objects of the same type. If you call a vtable "a part of an object", call a function "a part of a function pointer".

Collectives™ on Stack Overflow

Array of non-contiguous objects

1 Answer 1

12 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

12 Comments

Linked

Related