ToList method in Linq

Question

If I am not wrong, the ToList() method iterate on each element of provided collection and add them to new instance of List and return this instance.Suppose an example

//using linq list = Students.Where(s => s.Name == "ABC").ToList(); //traditional way foreach (var student in Students) { if (student.Name == "ABC") list.Add(student); }

I think the traditional way is faster, as it loops only once, where as of above of Linq iterates twice once for Where method and then for ToList() method.

The project I am working on now has extensive use of Lists all over and I see there is alot of such kind of use of ToList() and other Methods that can be made better like above if I take list variable as IEnumerable and remove .ToList() and use it further as IEnumerable.

Do these things make any impact on performance?

Consider Where as an if-clause in a loop, not as a loop. Actually the Where will be used when ToList enumerates the sequence. — Tim Schmelter
– Tim Schmelter, Commented Feb 22, 2013 at 15:31

svick · Accepted Answer · 2013-02-22 15:48:56Z

Do these things make any impact on performance?

That depends on your code. Most of the time, using LINQ does cause a small performance hit. In some cases, this hit can be significant for you, but you should avoid LINQ only when you know that it is too slow for you (i.e. if profiling your code showed that LINQ is reason why your code is slow).

But you're right that using ToList() too often can cause significant performance problems. You should call ToList() only when you have to. Be aware that there are also cases where adding ToList() can improve performance a lot (e.g. when the collection is loaded from database every time it's iterated).

Regarding the number of iterations: it depends on what exactly do you mean by “iterates twice”. If you count the number of times MoveNext() is called on some collection, then yes, using Where() this way leads to iterating twice. The sequence of operations goes like this (to simplify, I'm going to assume that all items match the condition):

Where() is called, no iteration for now, Where() returns a special enumerable.
ToList() is called, calling MoveNext() on the enumerable returned from Where().
Where() now calls MoveNext() on the original collection and gets the value.
Where() calls your predicate, which returns true.
MoveNext() called from ToList() returns, ToList() gets the value and adds it to the list.
…

What this means is that if all n items in the original collection match the condition, MoveNext() will be called 2n times, n times from Where() and n times from ToList().

Good description (as long as this is LINQ to Objects). LINQ to SQL/EF will only iterate over the resulting datatable.
@JimWooley Yeah, I assumed this is LINQ to objects, that's what the question seems to be about (though it doesn't say it explicitly).
This is wrong no matter if its LINQ to objects or LINQ to SQL/EF. You can see my answer if you are interested in why. MoveNext is NOT called 2n times. There is only one iteration made.

Evelie · Accepted Answer · 2013-02-22 17:17:46Z

var list = Students.Where(s=>s.Name == "ABC");

This will only create a query and not loop the elements until the query is used. By calling ToList() will first then execute the query and thus only loop your elements once.

List<Student> studentList = new List<Student>(); var list = Students.Where(s=>s.Name == "ABC"); foreach(Student s in list) { studentList.add(s); }

this example will also only iterate once. Because its only used once. Keep in mind that list will iterate all students everytime its called.. Not only just those whose names are ABC. Since its a query.

And for the later discussion Ive made a testexample. Perhaps its not the very best implementation of IEnumable but it does what its supposed to do.

First we have our list

public class TestList<T> : IEnumerable<T> { private TestEnumerator<T> _Enumerator; public TestList() { _Enumerator = new TestEnumerator<T>(); } public IEnumerator<T> GetEnumerator() { return _Enumerator; } System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { throw new NotImplementedException(); } internal void Add(T p) { _Enumerator.Add(p); } }

And since we want to count how many times MoveNext is called we have to implement our custom enumerator aswel. Observe in MoveNext we have a counter that is static in our program.

public class TestEnumerator : IEnumerator { public Item FirstItem = null; public Item CurrentItem = null;

 public TestEnumerator() { } public T Current { get { return CurrentItem.Value; } } public void Dispose() { } object System.Collections.IEnumerator.Current { get { throw new NotImplementedException(); } } public bool MoveNext() { Program.Counter++; if (CurrentItem == null) { CurrentItem = FirstItem; return true; } if (CurrentItem != null && CurrentItem.NextItem != null) { CurrentItem = CurrentItem.NextItem; return true; } return false; } public void Reset() { CurrentItem = null; } internal void Add(T p) { if (FirstItem == null) { FirstItem = new Item<T>(p); return; } Item<T> lastItem = FirstItem; while (lastItem.NextItem != null) { lastItem = lastItem.NextItem; } lastItem.NextItem = new Item<T>(p); } }

And then we have a custom item that just wraps our value

public class Item<T> { public Item(T item) { Value = item; } public T Value; public Item<T> NextItem; }

To use the actual code we create a "list" with 3 entries.

 public static int Counter = 0; static void Main(string[] args) { TestList<int> list = new TestList<int>(); list.Add(1); list.Add(2); list.Add(3); var v = list.Where(c => c == 2).ToList(); //will use movenext 4 times var v = list.Where(c => true).ToList(); //will also use movenext 4 times List<int> tmpList = new List<int>(); //And the loop in OP question foreach(var i in list) { tmpList.Add(i); } //Also 4 times. }

And conclusion? How does it hit performance? The MoveNext is called n+1 times in this case. Regardless of how many items we have. And also the WhereClause does not matter, he will still run MoveNext 4 times. Because we always run our query on our initial list. The only performance hit we will take is the actual LINQ framework and its calls. The actual loops made will be the same.

And before anyone asks why its N+1 times and not N times. Its because he returns false the last time when he is out of elements. Making it the number of elements + end of list.

If you compare the two samples, the one with LINQ really does iterate (i.e. call MoveNext() for each element) the collection twice: once in Where() and once in ToList(). (Though the second time, it's likely to be a smaller collection.) This usually won't affect performance much, but it could have an impact.
No because as I mentioned its just a query thats not executed. Its executed first when you actually use it. Similar to how it would look if you used foreach(var v in list) after. It would still only run once.
Yes, the Where() is only execute when you iterate the result. But when you do that, Where() iterates the original collection and ToList() iterates the collection that's returned from Where(). So you do iterate twice.
@svick: The Where does not iterate, only the ToList does.
@svick Can you give me a link somewhere where I can read more on this. Im not telling you that you are wrong. But if I'm wrong I really would like to know it..

Jim Wooley · Accepted Answer · 2013-02-22 16:42:02Z

To answer this completely, it depends on the implementation. If you are talking about LINQ to SQL/EF, there will be only one iteration in this case when .ToList is called, which internally calls .GetEnumerator. The query expression is then parsed into TSQL and passed to the database. The resulting rows are then iterated over (once) and added to the list.

In the case of LINQ to Objects, there is only one pass through the data as well. The use of yield return in the where clause sets up a state machine internally which keeps track of where the process is in the iteration. Where does NOT do a full iteration creating a temporary list and then passing those results to the rest of the query. It just determines if an item meets a criteria and only passes on those that match.

What about when you use a projection, like a aList.Select(i => new B(i.Name)).ToList(); Does it loop twice with LINQ to Objects?
@codewise No, the iterator yields new values in the single foreach iteration used by ToList.

SWeko · Accepted Answer · 2013-02-22 15:35:24Z

First of all, Why are you even asking me? Measure for yourself and see.

That said, Where, Select, OrderBy and the other LINQ IEnumerable extension methods, in general, are implemented as lazy as possible (the yield keyword is used often). That means that they do not work on the data unless they have to. From your example:

var list = Students.Where(s => s.Name == "ABC");

won't execute anything. This will return momentarily even if Students is a list of 10 million objects. The predicate won't be called at all until the result is actually requested somewhere, and that is practically what ToList() does: It says "Yes, the results - all of them - are required immediately".

There is however, some initial overhead in calling of the LINQ methods, so the traditional way will, in general, be faster, but composability and the ease-of-use of the LINQ methods, IMHO, more than compensate for that.

If you like to take a look at how these methods are implemented, they are available for reference from Microsoft Reference Sources.

Collectives™ on Stack Overflow

ToList method in Linq

4 Answers 4

3 Comments

13 Comments

2 Comments

Comments

Linked

Hot Network Questions