0

I have an ~80,000 item list from a web service, of which i need to work out the items to synchronize to a local database based on the below:

  • Insert missing data from web service in to local db
  • Update newer web service data that is stale in local db
  • Delete from local db if no longer returned in the web service).

It is taking 30 to 60 seconds to iterate through, specifically on the toInsert line. I dont see 80k to be many records (the TickerV2 structure is about 10 small fields mostly int).

I must be doing something horrendous, any ideas on making this more performant please?

public class TickerV2 { public string Ticker { get; set; } // Ticker is the key by which we operate public string Name { get; set; } public Market Market { get; set; } public Locale Locale { get; set; } public TickerType Type { get; set; } public bool Active { get; set; } public string PrimaryExch { get; set; } public DateTime Updated { get; set; } public CurrencyCodes Currency { get; set; } // note the Market, Locale, CurrencyCode are all enum but not indexed } async Task SaveTickersToDatabaseAsync(IEnumerable<TickerV2> web) { using var connection = new SqlConnection(this.dbConnectionString); await connection.OpenAsync(); var db = connection.Query<TickerV2>("SELECT * FROM Tickers").ToList(); var dbHashset = db.Select(x => x.Ticker).ToImmutableHashSet(); var webHashset = web.Select(x => x.Ticker).ToImmutableHashSet(); var toDeleteTickers = dbHashset.Except(webHashset).ToList(); var toInsertTickers = webHashset.Except(dbHashset).ToList(); var toInsert = web.Where(x => toInsertTickers.Contains(x.Ticker)).ToList(); var toUpdate = db .Join(web, dbData => dbData.Ticker, web => web.Ticker, (db, web) => new { Db = db, Web = web }) .Where(joined => joined.Web.Updated > joined.Db.Updated) .Select(x => x.Web) .ToList(); } 

UPDATE USING DICTIONARY

I got a massive speed increase using the below... I guess previously we were searching for Contains (which is sequential??) on each Where iteration - is this statement correct?

Code becomes:

var toInsert = new List<TickerV2>(); var webDictionary = web.ToImmutableDictionary(x => x.Ticker); toInsert.AddRange(from tickerKey in toInsertTickers select webDictionary[tickerKey]); 

But not sure if in the context of the question and other operators, if this is the best way?

3
  • 1
    If there is a unique property or properties in ticker the queries can be optimized using join and groupjoin. if you can post the tickerv2 type may be one can suggest. Commented Jul 11, 2020 at 15:37
  • Thanks have updated the question to add further details Commented Jul 11, 2020 at 15:40
  • You observation is correct. You search was sequential and was the reason for the poor performance. this line is the culprit toInsertTickers.Contains(x.Ticker) as this line scans the complete collection in worst case to find exclusively. Commented Jul 11, 2020 at 16:31

1 Answer 1

2

Queries can be optimized as follows. Also loading so many entries in memory might cause memory leak. Try to apply filter at database, am not a database expert cant suggest a query though. I believe converting to hashsets is not required as it causes overhead just for comparsion.

 IEnumerable<TickerV2> web = new TickerV2[0]; IEnumerable<TickerV2> db = new TickerV2[0]; var entriesMissingFromDb = web.Except(db, new TickerV2Comparer()); var toInsert = db.Join(entriesMissingFromDb, _db => _db.Ticker, _web => _web.Ticker, (_db, _web) => _web) .ToList(); 

Comparer is as follows

 public class TickerV2Comparer : IEqualityComparer<TickerV2> { public bool Equals(TickerV2 x, TickerV2 y) { if (ReferenceEquals(x, y)) return true; if (x == null || y == null) return false; return x.Ticker.Equals(y.Ticker); } public int GetHashCode(TickerV2 obj) { return obj.Ticker.GetHashCode(); } } 
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Neelesh, on the case of toInsert, this would need to be the TickerV2 that are in the web list but not in the db list. I believe your Select(a => a._web) returns all but there should not be common (since db does not have the web data, yet, hence we need to insert)?
Thank you kindly, note i have updated my question also, using dictionary i got a massive performance increase (less than a second, compared to 30+ before)
Same equality comparer can be used to find entries for delete operation too to find out exclusive entries. I hope you must have got it.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.