0

I have created two lists of objects: One is records from an xml and the other is records from the database.

The rule is check if the record from an xml exist in the database then exclude.

I have thought of two options:

First is to loop the list of records from an xml and for each record check if the id exist in the database.

Second is creating a list of objects from an xml and a list of objects from the database. Then compare the two list and get the result.

Which one is efficient? I'm thinking of the second options because instead of looping and querying the record, each time, if existed in the database why not put them in the lists and compare them by using linq or equality comparer.

4
  • What is efficence? The shortes way between two points is a line. So, efficience is defined as to hold the way as short as possible. The shortes way to compare two bytes is done bit for bit. There is no abbreviation. It is necessary to have two identical formats to compare. Your train of thoughts are correct. Commented Feb 4, 2016 at 8:20
  • Do you need to do any additional work in the application with the duplicates, or can you simply attempt to insert all entries and let the database reject the duplicates by virtue of unique constraints? Commented Feb 4, 2016 at 9:07
  • Doing the work in memory is definitely going to be faster than querying the database on each record. Commented Feb 4, 2016 at 9:21
  • @DavidPacker Are you suggesting to query for all records, do the filtering on the application side, and inserting the new ones? That's race-prone (in the case of multiple instances) and very costly as the number of elements in the database grows. Commented Feb 4, 2016 at 10:21

1 Answer 1

4

Which one is efficient? depends on your definition of efficient and on your data:

Which kind of efficient do you mean: "less processing time", "less memory consumption", "less hours needed to implement or maintain"?

I assume you mean "less processing time".

The answer depends:

  • if database contain 1 million rows and XML contain only two rows the first "let database check" is more efficient.
  • if database and XML both have many items then the second "in memory" solution can be more efficient if data is properly sorted.

In 99.99% of all cases "processing time efficiency does not matter" as long as the code is not executed more than 10 times per second. Is there a big difference if version 2 needs 0.01 seconds while version 1 needs 0.2 seconds?

If i where you i would implement the easier coding as "let database check" until you really measure that this is a performance bottleneck. Be aware of the anti-pattern premature optimisation

1
  • 1
    Being afraid of premature optimization should not stop you from using the correct and efficient algorithm for the problem you're facing. Commented Feb 4, 2016 at 11:53

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.