0

I have the below snippet which takes a long time to run as the data increase.

OrderEntityColection is a List and samplePriceList is a List

OrderEntityColection = 30k trades samplePriceList = 1million prices

Takes easily 10-15 minute to finish or more

I have tested this with 1500 orders and 300k prices but it takes around 40-50 seconds as well and as the orders increase so do prices and even takes longer

Can you see how i can improve this. I have alreadyy cut it down to these numbers before in hand from a big set.

MarketId = int Audit = string

foreach (var tradeEntity in OrderEntityColection) { Parallel.ForEach(samplePriceList,new ParallelOptions {MaxDegreeOfParallelism = 8}, (price) => { if (price.MarketId == tradeEntity.MarketId) { if (tradeEntity.InstructionPriceAuditId == price.Audit) { // OrderExportColection.Enqueue(tradeEntity); count++; } } }); } 
12
  • 2
    If this is EntityFramework (or another ORM) you'll get a huge performance increase by just writing the logic as a simple bit of SQL. Also you'll have a race condition here with a simple count++ and threaded code. Commented Feb 23, 2018 at 17:33
  • Where is this data coming from? This looks like work best done on a database. Joining on a list of a million entries in memory is going to take a while. Commented Feb 23, 2018 at 17:35
  • ignore the count++ please as its not required here. Commented Feb 23, 2018 at 17:36
  • would tour inner loop comparisons also require that the index of both of your collections be the same? is that expected? Commented Feb 23, 2018 at 17:36
  • What state do you need from the loop if you are not using count? This is critically important. Commented Feb 23, 2018 at 17:36

3 Answers 3

1

So you want to do data in memory, ok - you need to be smart about the way you formulate the data up front. First thing is you're getting a list of prices by MarketId - so create that first:

var pricesLookupByMarketId = samplePriceList.ToDictionary( p => p.MarketId, v => v.ToDictionary(k => k.Market)); 

Now you have a Dictionary<int,Dictionary<int,Price>>(); (note ive assumed both MarketId and Audit are ints. If they're not it should still work)

Now your code becomes super simple and a lot faster

foreach (var tradeEntity in OrderEntityColection) { if(pricesLookupByMarketId.ContainsKey(tradeEntity.MarketId) && pricesLookupByMarketId[tradeEntity.MarketId].ContainsKey(tradeEntity.InstructionPriceAuditId)) { count++; } } 

Or, if you'er a fan of one long line

var count = OrderEntityColection.Count(tradeEntity => pricesLookupByMarketId.ContainsKey(tradeEntity.MarketId) && pricesLookupByMarketId[tradeEntity.MarketId].ContainsKey(tradeEntity.InstructionPriceAuditId)) 

As pointed out in the comments, this can be further optimized to stop repeated reads of the dictionaries - but the exact implementation depends on how you want to use this data in the end.

Sign up to request clarification or add additional context in comments.

6 Comments

to have even better solution, you can use dict.TryGetValue to not seek twice across dictionary buckets. Final complexity is O(N+M)
@DmitryPavliv im sure you're right, but do you think that should be done twice (one for the outer lookup and one for the inner)? I cant visualize the change....
it depends on what OP really wants to do. Taking into account their comment // OrderExportColection.Enqueue(tradeEntity); probably it's better to do for both dictionaries
I think it also could change the foreach with a parallel one. (If the Dictionary is read only it will further optimize the speed by dividing the work )
@Jamiec, one more correction, looks like you can have multiple prices with same Market / Audit. So dictionary should be Dictionary<int,Dictionary<int,List<Price>>> which can be found by using groupby
|
0

In the parallel loop you have cases, where you skip the processing for certain items. That's quite expensive, as you rely on that check to also happen on a separate thread. I'd just filter out the results first before processing those, as follows:

foreach (var tradeEntity in OrderEntityColection) { Parallel.ForEach(samplePriceList.Where(item=>item.MarketId == tradeEntity.MarketId && item.Audit == tradeEntity.InstructionPriceAuditId) ,new ParallelOptions {MaxDegreeOfParallelism = 8}, (price) => { // Do whatever processing is required here Interlocked.Increment(ref count); }); } 

On a side note, seems like you need to replace count++ with Interlocked.Increment(ref count), to be thread safe.

2 Comments

this still gives complexity of O(N*M)
Yes. I have optimized it from concurrency perspective, rather than from algorithm.
0

Manage to do this with the help of my friend

var samplePriceList = PriceCollection.GroupBy(priceEntity=> priceEntity.MarketId).ToDictionary(g=> g.Key,g=> g.ToList()); foreach (var tradeEntity in OrderEntityColection) { var price = samplePriceList[tradeEntity.MarketId].FirstOrDefault(obj => obj.Audit == tradeEntity.Audit); if (price != null) { count+=1; } } 

6 Comments

FirstOrDefault still enumerates the list for every item, you'd be better with a dictionary solution as with my answer
@jamiec This syntax doesnt seem to work for me var pricesLookupByMarketId = samplePriceList.ToDictionary( p => p.MarketId, v => v.ToDictionary(k => k.Market));
define "doesnt work". I had a typo Maybe it was supposed to be k.Audit at the end there
var pricesLookupByMarketId = samplePriceList.ToDictionary( p => p.MarketId, v => v.ToDictionary(k => k.Market)); It doesnt recognises k.Market. sorry im a newbie so trying to work out the correct syntax here and struggling
@Jamiec k. does not show any properties on intellisense. do i have to write and overload method in the class for ToDictonary ?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.