Return to Revisions

10 of 12

added 97 characters in body

edited Jun 25 at 19:45

1.9k
4
17
22

The thing to understand is C# has real arrays, in the formal computer science sense. This means several things, among them:

Fixed size
Contiguous memory

Many other platforms have an array construct of some kind which is in fact NOT an array, at least in the formal computer science sense, because it violates one (and usually both) of those features (though they will often paper over the latter). Rather, the platforms offer a collection with the "array" name merely attached to it.

These other platforms are correct to do this, in a sense, because it turns out real computer science -style arrays are not what we need most of the time; a collection type is almost always far more appropriate. That is, it's extremely common to either want to be able to do things like append or remove items, look up elements by key rather than index, or for the contents to be immutable. None of these things are guaranteed by simple arrays.

Thankfully, C# includes a number of collection types as well to fill in these gaps and more, and we can use types such as List<T>, Dictionary<T>, and many others. It's worth noting these collection types tend to also closely relate to computer science concepts of their own, though the naming is not always perfect (it's possible List<T> should have been named Vector<T>, for example).

This should be enough now to understand why the practice you observed for C# is the way it is.

This collection vs array design should be considered a strength for C#, rather than a weakness.

Formally specifying the various collections gives the programmer the power and guidance to use the collection type that is actually appropriate to the situation, with less encouragement to fall back to a baseline catch-all array type. Additionally, it gives the programmer power to use a real array when appropriate, which after all exists for a reason and can have certain nice performance wins when working with truly low-level code, such as interoperating with low-level operating system or network constructs.

Why then do so many developers (and API designers) seem to prefer using List in returned data sets?

I think I have explained why we don't use array, but in my opinion you are right to question this: the choice of List<T> in these specific scenarios is also incorrect and has led to a lot of inefficient C# data access code over the years.

Specifically, this choice creates a tendency for code to call .ToList() when it would not otherwise be necessary, or to manually create a list instance to return and add each record. It forces us to fully materialize data result sets when we might otherwise be able to limit memory use to one record at a time.

In most cases, we should be using IEnumerable<T> instead of List<T>. IQueryable<T> and (more recently) IAsyncEnumerable<T> are also good options. In any case, from here we either skip the .ToList() call or use an iterator instead of instantiating a list and adding records. Again, this allows us to set up data processing systems that stream data one record at a time, rather than materializing entire result sets as usually happens today. This could dramatically improve memory use (for I hope obvious reasons) and initial response times (because we can start streaming data as we receive the first record, instead of waiting for the last record to be added to a list).

This also displays the weakness of my argument that individual collection types are a strength: the various types are only a strength to the degree programmers understand them and make good choices. In practice, we often still fall back to a baseline List<T> collection, whether or not it's really the right option.

But as one final counter-point, this is still no worse than what happens on other platforms, and at least makes it a little easier to do the right thing now and then.

answered Jun 23 at 15:20

Joel Coehoorn

1.9k
4
17
22