1

I have a table with 100,000+ records. Customer has asked that we encrypt the username field, copy the encrypted value to a new field, and clear the original username field. The encryption can only be performed in the application, not the database.

The existing codebase used the Entity Framework in the past for these tasks, but never for a table of this size. Legacy code looked something like:

foreach(var phone in db.Phones){ phone.Enc_Serial = Encrypt(phone.Serial); phone.Serial = ""; } db.SaveChanges(); 

Given this is a bulk update, would there be any advantage to doing this with a raw SQL command? I'm thinking that at least we wouldn't have a ton of tracked objects sitting in the DbContext consuming memory.

var idsAndSerials = db.Phones.Select(p => new { id = p.Id, serial = p.Serial }; foreach(var item in idsAndSerials ){ string sql = String.Format("Update phone set Enc_Serial ='{0}' where phoneId={1}", Encrypt(item.serial), item.id.ToString()); db.Database.ExecuteSqlCommand(sql); } 
7
  • 1
    Out of curiosity, why does it matter? Is your legacy method failing to run in a sufficient amount of time? Have you profiled the difference between the two methods for 1k records? Commented Apr 15, 2015 at 21:00
  • 3
    A foreach loop for one single update at a time would take a long time. I would advise constructing the table on the C# end, then inserting that into a Temp table on the database and then running one single update. Commented Apr 15, 2015 at 21:00
  • 1
    If you do this with a "Raw SQL command", then technically, you would be performing the encryption in the database and not in the application. (Since SQL commands are requested by the client, but actually executed by the DBMS server). Commented Apr 15, 2015 at 21:00
  • 2
    Adding to @SurgeonofDeath's suggestion, you could also use a table valued parameter and a stored procedure. Wouldn't get rid of your memory concerns on the .NET end, but would get rid of the multiple updates happening on the SQL end. But then I'd ask again: is your legacy method really performing that badly? Commented Apr 15, 2015 at 21:04
  • @DanField No, the legacy function is not performing poorly. A colleague is concerned about the amount of memory such a large amount of records would consume during the operation, so I thought I would throw it out to you good folks on SO for input. Commented Apr 15, 2015 at 21:07

2 Answers 2

1

In the example you've provided, no way. You're still iterating through each record and calling UPDATE. At least in the first example I believe those statements would get executed as a batch, and would be transactional so that all updates succeed or none of them do.

Sign up to request clarification or add additional context in comments.

Comments

1

Since this is a significant update, I'd suggest creating a tracking table (on the SQL side) in which you sequentially number each of the rows to be updated (and also store the row's PK value). Also include a column in the tracking table that lets you mark the row as done (say 0 or 1). Set a foreign key into the original table via the PK value.

Update your data model on the EF side to include the new tracking table. Now you have a new table that will easily let you retrieve, say, 1K record batches to work on at a time. This won't have excessive memory consumption. The application logic can do the encrypting. As you update those records, mark the ones that are updated as "done" in your tracking table.

Get the next 1K of not done records via tracking table (use navigation properties to get the real records). Repeat.

It'll be done very fast and with no undue load. 100000+ records is not really a lot, especially if you use a divide and conquer approach (100+ batches).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.