0

I am using a 4 core processor. I am implementing a scenario with Parallel.Foreach concept. I have a large record set in database. Using this parallel processing concept I am trying to update some values in those records.

I have divided the record collection into small subset and updating.

Approach 1:- I divided the collection to 4 subset (as I have 4 cores) and did the parallel processing.

But I was thinking if I divide the collection into more number of subsets (say 100), whether my records will update faster?

My understanding is the record will not update faster as I have only 4 cores and also this approach uses the context switching concept. So the resulting time will be more compared to first approach.

Please confirm.

3
  • It depends. Your routine involves not only 4-cores CPU, but RDBMS (database) and network. For instance (my case): 2 core CPU, 1Gb net, 32 cpu superdome (with Oracle 11.2 RDBMS on it) shows the best performance with about 30 threads at night and about 10 during the day Commented Mar 18, 2016 at 7:22
  • Your question is very subjective and depends a lot on the underlying hardware and other processes running on the machine. The only way to know is to test it by implementing some basic diagnostics to measure performance while increasing the number of subsets. Commented Mar 18, 2016 at 7:27
  • It's not clear what sort of processing you're trying to do. If possible, do all of the processing down in the database, with queries that treat the data as a set rather than thinking about individual rows/updates. That's the sort of processing databases are designed for. Commented Mar 18, 2016 at 7:27

2 Answers 2

2

Parallel.For already schedules each iteration to different cores if they're avaiable. You don't need to divide your data in subsets to get parallelism.

For me, the main bottleneck here isn't your CPU but the fact that you're working with a database. Most RDMS and NoSQL engines are designed to work in high demand scenarios, but your commands have still to go over the wire to arrive to your database server.

If I'm not mistaken, you should open more than a pooled database connection and each iteration in parallel should issue a command to one of these database connections. That is, this will ensure that you'll be able to send database commands also in parallel.

Sign up to request clarification or add additional context in comments.

Comments

0

I would not worry too much about the partitioning the data yourself, .NET uses adaptive partitioning for parallel loops under the covers and that should be sufficient for most cases if not all, have not come across a single case yet where a custom partitioner is needed in real life.

With parallel processing in .NET just keep in mind though if your loops are long (ie more than 1 second waiting on I/O bound operations or doing long calculations) you may see a spike in number of worker threads. .NET thread pool cannot distinguish between the case where all threads are blocked vs the threads are actually doing work, so it starts injecting threads to avoid thread starvation. That may not be what you necessarily need. You can limit concurrent threads using ParallelOptions.MaxDegreeOfParallelism property.

If your parallel loops are doing I/O calls, I would normally recommend to create tasks for all I/O operations and at the end await for all of them via Task.WhenAll. In this case you would not even need any parallelism since you are simply creating tasks that represents the I/O request so you can even create these tasks sequentially and await them at the end.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.