1

Lets say I have a collection of 10,000 objects that I need to add to a database using Entity Framework (I recognize that EF isn't well-suited to this task, but let's run with it for now). For the purposes of this question, we'll make the following assumptions:

  1. There is only one table, with an IDENTITY primary key.
  2. The table is empty.
  3. The objects are simple -- everything is a primitive data type (int, bool, string, etc.)

I could do this in Entity Framework in one of two ways:

// Option 1 foreach (var item in largeCollection) { _context.SomeTable.Add(item); } _context.SaveChanges(); // Option 2 _context.SomeTable.AddOrUpdate(largeCollection); _context.SaveChanges(); 

Is the performance of one method inherently better or worse than the other? Or do they both devolve into an equal number of single-line INSERT statements?

In other words, from a performance standpoint, is there any advantage to using Add() over AddOrUpdate() (or vice-versa) when inserting multiple items into a database?

2 Answers 2

5

The best answer is using AddRange. However, Add is by FAR way more performant than AddOrUpdate.

AddOrUpdate

Perform a database round-trip to every entity to check if it already exists in the destination table.

So even if your table is empty, if you use AddOrUpdate on 10,000 objects, 10,000 database roundtrip will be performed to check if the data exists.

Add

The Add method will add an entity in the change tracker and call the DetectChanges method after every records added.

So if you add 10,000 objects, the DetectChanges method will be called 10,000 times which can take more than 1 minutes if you have a few relation

See: Performance-Add

AddRange

The AddRange method will add all entities and will call the DetectChanges method once after all entities are added.

So if you add 10,000 objects, the DetectChanges method will be called once.

_context.SomeTable.AddRange(largeCollection); 

In all this situation, once you call SaveChanges, 10,000 additional database round-trip will be performed to save entities which can be quite slow as well.

Disclaimer: I'm the owner of the project Entity Framework Extensions

(This library is NOT free)

This library can make your code more efficient by allowing you to save multiples entities at once. All bulk operations are supported:

  • BulkSaveChanges
  • BulkInsert
  • BulkUpdate
  • BulkDelete
  • BulkMerge
  • BulkSynchronize

Example:

// Easy to use context.BulkSaveChanges(); // Easy to customize context.BulkSaveChanges(bulk => bulk.BatchSize = 100); // Perform Bulk Operations context.BulkDelete(customers); context.BulkInsert(customers); context.BulkUpdate(customers); // Customize Primary Key context.BulkMerge(customers, operation => { operation.ColumnPrimaryKeyExpression = customer => customer.Code; }); 
Sign up to request clarification or add additional context in comments.

Comments

1

So this question discusses briefly the difference between an "update" vs an "insert" in terms of database commands:

Cost of Inserts vs Update in SQL Server

Furthermore, according to the official MSDN documentation (https://msdn.microsoft.com/en-us/library/hh846520(v=vs.103).aspx) the add or update performs what is called an "upsert" which is basically a fancy way of saying, update the row if it exists and insert the row if it doesn't.

So with this information now, it would seem logical that Add() is the better method. Furthermore, given that this specific application is being used to populate a database initially (if I'm wrong on this assumption please correct me), it would seem as though doing an AddOrUpdate() is pointless because there is nothing to update.

5 Comments

If anything the OP should just test and measure both scenario's in a production like test setting and then analyze why both are slow or one is better on the database but worse for the memory pressure or other crazy scenario's. The rest is pure speculation.
Sure they can time it and I would agree that's the safest approach to roll with, but there are differences in speed between inserts and updates at the end of the day and I was just giving a general answer with a bit of literature to help OP better understand what is going on behind the scenes.
Well, the main advantage to AddOrUpdate is that it can take a collection of items. That's what made me think of it. Where you would need to loop through the collection for an Add, calling Add once per item, you can just pass AddOrUpdate the entire collection all at once and let it handle it. That's what prompted my question. (And yep, I'm populating a database here, which is why I need to do this all at once.)
Have you checked out AddRange() yet? It takes in a collection of entities to add all at once. msdn.microsoft.com/en-us/library/…
@AriRoth It's simple as that - if you want to insert records - use Add or AddRange methods. At the end, whether you use Add, AddRange or AddOrUpdate, the number of INSERT commands will be the same. AddOrUpdate will execute additionally the same number of SELECT commands though, in order to determine if it needs to Add or Update - that's he whole purpose of that method (to add or update), not that you can pass a collection.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.