1

I have a list of objects and I have to do some elaboration for each one of them, all of this in the least amount of time possible.

Since those elaborations are indipendent from each others, we've decided to do them in parallel with Parallel.ForEach.

Parallel.ForEach(hugeObjectList, new ParallelOptions { MaxDegreeOfParallelism = 50 }, obj => DoSomeWork(obj) ); 

Since it seems unreasonable to me setting a huge number on ParallelOptions.MaxDegreeOfParallelism (e.g. 50 or 100), how can we find the optimal number of parallel task to crunch this list?

Does Parallel.Foreach start a DoSomeWork on a different core? (so, since we have 4 cores, the correct degree of parallelism would be 4?)

8
  • Have you took a look at the remarks for MaxDegreeOfParallelism property learn.microsoft.com/en-us/dotnet/api/… Commented May 28, 2018 at 11:11
  • 1
    First thing to do is read the documentation, second thing to do is run some tests and play with the setting. Empirical evidence goes a long way Commented May 28, 2018 at 11:14
  • 2
    Is the work that you are doing computationally expensive? Is there a lot of IO? Are you downloading things? Commented May 28, 2018 at 11:15
  • 1
    Doing parallel calls to the same database might not necessarily increase the performance, it might even do the opposite. Commented May 28, 2018 at 11:25
  • 1
    If work is strictly CPU-bound, then using <number of cores> threads will likely to achieve the best results. If work mostly IO - you should not use Parallel.ForEach in the first place. Commented May 28, 2018 at 11:31

3 Answers 3

1

I think this says it all

By default, For and ForEach will utilize however many threads the underlying scheduler provides, so changing MaxDegreeOfParallelism from the default only limits how many concurrent tasks will be used.

MSDN

Sign up to request clarification or add additional context in comments.

2 Comments

so the right value for MaxDegreeOfParallelism is the one that we don't need to specify... thanks!
@Doc don't specify it, or set it explicitly if your code needs it. But in most cases the system will find the best value for you. If you set it to high you can stress the CPU if it is a CPU bound task. This will get worse performance
1

Asking the platform should get you close to the optimum (for CPU bound work).

new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, 

Doing nothing is another very good option, ie

//new ParallelOptions { MaxDegreeOfParallelism = 50 }, 

Edit

there's a lot of io with a database ...

That makes MaxDegreeOfParallelism = 1 another very good candidate. Or maybe 2.

What you really should be looking into is async/await and async database calls. Not the Parallel class.

2 Comments

No it should not. If DoSomeWork() contains CPU bound algorithm it'll be better to avoid setting MaxDegreeOfParallelism manually. Otherwise, if you have I/O bound work then you can choose degree of parallelism less or equal then Environment.ProcessorCount and your code will utilize all logical processors and has no extra context switches, but each task will wait whole I/O processes instead of not I/O bound operations in another Tasks.
I can't make out all of this. The answer does try to draw attention to the IO vs CPU question. But "will utilize all logical processors" won't happen in any scenario.
1

The only way to know for sure is to test it. More threads does not equal better performance, and may often yield worse performance. Some thoughts:

  1. Designing an algorithm for a single thread, and then adding Parallel.For around it is pointless. You must change your algorithm to take advantage of multiple threads or the benefits to parallel processing will be minor or negative.

  2. If you are reading from disk or downloading data over a network connection where the server is able to feed you as fast as you get the data, you may find that a producer/consumer pattern performs best. If the processing is computationally expensive, use many consumer threads (I tend to use Num Cores - 2. One for the UI, one for the producer). If not computationally expensive, it won't matter how many consumer threads you use.

  3. If you are downloading data from the Internet from a variety of sources, and the servers take time to respond, you should start up quite a few threads (50-100 is not crazy). This is because the threads will just sit there waiting for the server to respond.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.