Optimizing FirstOrDefault

Question

I'm working on an application in which it takes quite a bit of time to initialize the data.

Some background: I'm creating a sort of pivot table in which I turn Figure 1 below into Figure 2. I've gone through with Stopwatchs and have been able to isolate a single line of code that takes 2 seconds during program load (it is in a loop called for roughly 2-300 rows of data, 2 seconds is the total time across all iterations). That line is marked in the code example below, Figure 3.

My question is if there is a better way to structure the example method such that I can avoid whatever overhead that line is causing. It is worth noting that when that line executes, mChar contains sometimes 0 elements, ~~but up to around 5~~ or 1 (see edit below), so I can't imagine what would cause this to take so much time to execute.

Figure 1

SAMPLENUMBER | ANALYSIS_NAME | ANALYSIS_STATUS ---------------------------------------------- 1234 | NO3 | I 1234 | SO4 | C 5678 | NO3 | C

Figure 2

SAMPLENUMBER | NO3 | SO4 ---------------------------------------------- 1234 | I | C 5678 | C |

Figure 3

private List<string> GetPivotData(DataRow sourceRow, DataTable sourceTable) { int pivotStartIndex = sourceTable.Columns.IndexOf(Cols.HotDate) + 1; var retList = new List<string>(sourceTable.Columns.Count - pivotStartIndex); for (int i = pivotStartIndex; i < sourceTable.Columns.Count; i++) { string[] elementTest = sourceTable.Columns[i].ColumnName.Split('-'); var mChars = from s in mainData.AsEnumerable() where s.Field<string>(Cols.Samplenumber) == sourceRow.Field<string>(Cols.Samplenumber) && s.Field<string>(Cols.ElementName) == elementTest[0].Trim() && s.Field<string>(Cols.AnalysisNumber) == elementTest[1].Trim() select s.Field<string>(Cols.AnalysisStatus); retList.Add(mChars.FirstOrDefault()); // <-- this is the line taking so much time. } return retList; }

I've tried things like having retList as an array rather than a list, as well as calling .ToList() on my LINQ result and using .Find(x => true) and they really haven't had any effect. The method in Figure 3 is itself inside a for loop, called roughly 200 times, once per sample. The for loop inside the method itself iterates on average 5 times, so that line of code gets called about 1000 times, depending on the data. That means that the average time for the line is around 2ms, which may just be too small in itself to be able to do anything with. Swapping my for loops for Parallel.For halves the time it takes, but I've found that my logic doesn't work well with concurrency. At all.

Edit: I should clarify why I'm using FirstOrDefault() I suppose and I need to clarify an error in my initial post as well. The error is that I said mChars can contain up to 5 elements when FirstOrDefault() is called (it is retList that has up to 5). It will only ever have 0 or 1 element(s). When it's empty, I still need to add a null to the list, so I'm looking into a way to add the mChars result directly to the list without any searches at all.

Peter · Accepted Answer · 2014-09-12 14:43:49Z

The line with from only defines the enumeration, but it doesn't actually run over the list and apply your select clause. In your case, that one is only applied when you call FirstOrDefault.

Therefore, to speed up the call to FirstOrDefault you need to speed up your where condition. I mean this one:

s.Field<string>(Cols.Samplenumber) == sourceRow.Field<string>(Cols.Samplenumber) && s.Field<string>(Cols.ElementName) == elementTest[0].Trim() && s.Field<string>(Cols.AnalysisNumber) == elementTest[1].Trim()

The most expensive parts about this are probably the calls to s.Field<string>(...), followed by the Trim() calls. All of these can be cached. Just convert the maindata into something that's cheap to access, and only do the conversion once, outside of the for loop. Something more or less like this (this won't run as is, only use it as inspiration):

struct TempItem { public string Samplenumber; public string ElementName; public string AnalysisNumber; public string AnalysisStatus; } string sourceSampleNumber = sourceRow.Field<string>(Cols.Samplenumber); var lookUpList = mainData.AsEnumerable().Where(i => i.Samplenumber == sourceSampleNumber).Select(i=> new TempItem { Samplenumber = s.Field<string>(Cols.Samplenumber), ElementName = s.Field<string>(Cols.ElementName), AnalysisNumber = s.Field<string>(Cols.AnalysisNumber), AnalysisStatus = s.Field<string>(Cols.AnalysisStatus) }).ToList(); // the ToList() part here forces application of the linq expressions for (int i = pivotStartIndex; i < sourceTable.Columns.Count; i++) { string[] elementTest = sourceTable.Columns[i].ColumnName.Split('-'); string elementName = elementTest[0].Trim(); string analysisNumber = elementTest[1].Trim(); var mChars = lookUpList.Where(i => i.ElementName == elementName && i.AnalysisNumber == analysisNumber).Select(i=>i.AnalysisStatus); retList.Add(mChars.FirstOrDefault()); }

I'm relatively fresh to LINQ, and your answer makes sense in the context of my metrics, as well as the fact that when I break on that line, I have to refresh the Results View to see what mChars contains. I was wondering why that query was happening seemingly instantly when I timed it, and the Add was taking so much longer. So basically the query just defines "if anything needs something out of mChars, here is how to get the data for it", and none of the actual legwork is done until that FirstOrDefault happens? — helrich
– helrich, Commented Sep 12, 2014 at 14:30
Yes, usually that's true. Keep in mind that they are allowed to change the implementation to some extent between framework versions. — Peter
– Peter, Commented Sep 12, 2014 at 14:34
Welp, I used your example code as a guide, and holy heck that's fast! Your solution does make a lot of sense, and it seems I was approaching this from the wrong direction all along. I'll definitely keep this in mind in my future LINQ endeavors. Thanks! — helrich
– helrich, Commented Sep 12, 2014 at 14:43
@helrich LINQ works off IQueryable<T>; the query doesn't get materialized until its results are being iterated over. First, FirstOrDefault, ToList and a couple other methods will materialize the query. This is true for LINQ-to-objects, but this mechanism is especially important to bear in mind when working with LINQ-to-Entities or LINQ-to-SQL, since the database will only get hit when the query gets materialized. I don't think it's likely to ever change in future versions, the LINQ provider is responsible for this. — Mathieu Guindon
– Mathieu Guindon, Commented Sep 12, 2014 at 17:45
The technical term for this is deferred execution (lazy evaluation is also possible but the former is specifically used in combination with LINQ). — Jeroen Vannevel
– Jeroen Vannevel, Commented Sep 12, 2014 at 19:24

Stack Exchange Network

Optimizing FirstOrDefault

Figure 1

Figure 2

Figure 3

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Optimizing FirstOrDefault

Figure 1

Figure 2

Figure 3

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions