1

Is there any overhead to using a struct shim to replace a jagged array?

To give a specific example

vertices = new KeyValuePair<uint, EdgeData>[][]; 

vs

private struct Vertex { public KeyValuePair<uint, EdgeData>[] Arcs { get; set; } } vertices = new KeyValuePair<uint, Vertex>[]; 

EdgeData is a class if it makes any difference
Obviously the intent is clearer in the struct example but it needs to be able to hold massive graphs so any memory overhead is significant

13
  • How massive is massive? I ask only because of the size of the stack (struct allocation) verse heap allocation Commented Apr 24, 2013 at 18:18
  • Good question, I'm not sure what the maximum is likely to be, the first dinesion (vertices) will be much higher than the second. For the small data set I'm looking at the vertext count is around 20k and the Arc count (for each Vertex) is in the 4-5 range. I'll find a bigger dataset and see how it looks Commented Apr 24, 2013 at 18:22
  • 1
    One thing to watch out for is that if you have an array of structs, then #elements * sizeof(struct) must be less than 2GB. If you have an array of reference types, then #elements * sizeof(pointer) must be less than 2GB. If the structs are large, this can reduce the maximum array size quite a bit - but we're talking LARGE arrays in any case. Commented Apr 24, 2013 at 18:30
  • 1
    Arrays (even arrays of value types) are reference types. The stack does not come into play here. Commented Apr 24, 2013 at 18:30
  • 1
    Either way you parse it, struct isnt really the way to go on this one Commented Apr 24, 2013 at 18:34

3 Answers 3

4

A struct may or may not be allocated on the stack. Reference types can never be allocated on the stack; they are always allocated on the heap.

From the Standard (ISO 23270), § 8.8:

8.8 Structs The list of similarities between classes and structs is long—structs can implement interfaces, and can have the same kinds of members as classes. Structs differ from classes in several important ways, however: structs are value types rather than reference types, and inheritance is not supported for structs. Struct values are stored “on the stack” or “in-line”. Careful programmers can sometimes enhance performance through judicious use of structs.

For example, the use of a struct rather than a class for a Point can make a large difference in the number of memory allocations performed at run time. The program below creates and initializes an array of 100 points.

With Point implemented as a class, 101 separate objects are instantiated—one for the array and one each for the 100 elements.

class Point { public int x, y; public Point(int x, int y) { this.x = x; this.y = y; } } class Test { static void Main() { Point[] points = new Point[100]; for (int i = 0; i < 100; i++) { points[i] = new Point(i, i*i); } } 

If Point is instead implemented as a struct, as in

struct Point { public int x, y; public Point(int x, int y) { this.x = x; this.y = y; } } 

only one object is instantiated—the one for the array. The Point instances are allocated in-line within the array. This optimization can be misused. Using structs instead of classes can also make an application run slower or take up more memory, as passing a struct instance by value causes a copy of that struct to be created.

So the answer is "Maybe".

For your example, wrapping an array (a reference type) within a struct (a value type), doesn't mean anything: that array is still allocated on the heap.

If however, you change your class EdgeData to a struct, it can be (but may not be) allocated in-line within the array. So if your EdgeData class is, for instance, 16 bytes in size, and you create and populate an EdgeData[] of 100 entries, you are actually allocating 1 array instance (with a backing store sized to hold 100 object references, and 100 individual instances of your EdgeData class.

If EdgeData is a struct, you allocate 1 array with a backing store sized to hold 100 EdgeData instances (in this case, 1600 bytes, since our hypothetical EdgeData struct is 16 bytes in size.)

Iterating over the class version of the array, especially if the array is very large, may cause paging, since you are probably losing locality of reference as you jump all over the heap to hit the individual EdgeData instances.

Iteration over the struct version of the array preserves locality of reference, since the EdgeData instances are in-line.

Sign up to request clarification or add additional context in comments.

3 Comments

When would an array of a value type ever do anything other than store all the data for each element in sequence within a single heap object?
@supercat: See the answer at stackoverflow.com/questions/203695/… &mdash; Jon Skeet says it better than I can. Read the entire discussion for even more details. As Jon Skeet pointed out: "it's worth imagining a world where in fact all local variables live on the heap - which would still conform with the spec." See also Eric Lippert's essay at blogs.msdn.com/b/ericlippert/archive/2009/04/27/…
I didn't see where that discussion said anything about arrays. Having local variables in a heap object that was pinned at suitable times would yield marshal-by-ref semantics that are compatible with those of variables on the stack [each field would have an address, which wouldn't move as long as a valid byref existed to it, but could move or disappear once all byrefs were gone]. Storing arrays of blittable value types as anything other than an in-line concatenation would break any code that expects to use marshal-by-ref semantics.
2

Arrays of structures tend to be rather efficient, though in your particular example you have an extra uint for each row. Also, avoid exposing properties of structure types, and if a structure represents a collection of independent values bound together with duct tape (e.g. the coordinates of a point), simply expose those items as fields. While there are many situations in which the JIT will convert property accesses to field accesses, there are also many where it cannot.

If one is comparing the efficiency of:

struct FloatPoint2D {public float X,Y;} FloatPoint3D[] MyArray; 

versus

float[] MyXCoords, MyYCoords; 

Accessing both the X and Y of items in random sequence will be faster with the struct defined above than with a pair of separate arrays (typically one cache miss instead of two) but accessing just the X or just the Y coordinate of many items in sequence will be faster if one is using separate arrays (twice as many useful coordinates will be fetched with each cache line).

In your particular example, it's unclear exactly what data your type needs to encapsulate; your struct and non-struct examples hold different data, so it's hard to say one is "more efficient".

Comments

2

Replacing the 2D array with a 1D array of structs will not cause any issues. It's really a matter of how you view the data. If it makes more sense to model it as an array of structures, each of which contains an array of arcs, then that's the way you should express it in code.

There are some small differences in the way that they're stored. In particular, your 1D array approach will occupy more memory, total, than the 2D array approach. Basically, you have an extra uint for each row.

Following struck. It's discussing the difference between the struct approach and a 2D array (i.e. [,]), rather than the jagged array ([][]) that the OP is using.

Actually, the total memory used would be more than that. In the 2D array approach, you have (row * col) KeyValuePair structures in an array. The array has an allocation overhead of about 50 bytes in the 64-bit runtime (about 40 bytes, if I recall, in the 32-bit runtime). In the 1D array approach, you still have (row * col) KeyValuePair structures, but each one contains an array that has the same 50 byte allocation overhead. In addition, you have the vertices array, that contains (row) KeyvaluePair structures.

However, your 2D array (just the array) would require (rows * cols * (4 + sizeof(IntPtr))) bytes. The 1D vertices array will only require (rows * (4 + sizeof(IntPtr))) bytes. If you're limited to 2 gigabytes for a single array (as you are in .NET 4.0 and earlier, or in .NET 4.5 unless you enable very large objects), then you can potentially have more items, total, using the 1D array of structs than with the 2D array. Assuming, of course, that you have enough memory to hold that many KeyValuePair<uint, EdgeData> instances.

So your overall memory usage will increase, but your largest single allocation will be much smaller.

5 Comments

Is the extra uint/row storing the size of the array in the Vertex struct or does it come from somewhere else?
Is that true in both a normal 2D array and a jagged 2d array? Wouldn't there be more bookkeeping storage in a jagged array since (presumably) it needs to keep track of the size of each 'sub-array' in some way?
@DavidHayes: ah, you're right. I've been assuming a 2D array rather than a jagged array. I think I need to delete this answer.
It's still an interesting answer, anyone know how the book keeping on jagged arrays is handled?
@DavidHayes: A 2D jagged array is just a 1D array of arrays. That is, given int[][] a = new int[10][];, you can then write a[0] = new int[42];. You can also write int[] b = a[0]; Viewed that way, the bookkeeping is straightforward: each row incurs the overhead of an array allocation.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.