How do I determine an appropriate value for MaxDegreeOfParallelism when using Parallel.ForEachAsync

Question

The example Scott Hanselman gives on his blog for using Parallel.ForEachAsync in .NET 6 specifies the value of MaxDegreeOfParallelism as 3.

However, if unspecified, the default MaxDegreeOfParallelism is ProcessorCount. This makes sense for CPU bound work, but for asynchronous I/O bound work, it seems like a poor choice for a default value.

If I'm doing something like in Scott's example below, but I want to do it as fast as possible, how should I determine the best value to use for MaxDegreeOfParallelism? Is it reasonable to specify this as int.MaxValue and just assume the TaskScheduler will do the most sensible thing when it comes to scheduling the work on the ThreadPool?

ParallelOptions parallelOptions = new() { MaxDegreeOfParallelism = 3 }; await Parallel.ForEachAsync(userHandlers, parallelOptions, async (uri, token) => { var user = await client.GetFromJsonAsync<GitHubUser>(uri, token); Console.WriteLine($"Name: {user.Name}\nBio: {user.Bio}\n"); });

btw, cause of there is no threads in I/O-operations at all (only initial request sending), i would say your MaxDegreeOfParallelism affects only how great your client requests BURST would be. Imho, this is preference/optimization trade-off as 70%/30% respectively in a rough approach. — Ryan
– Ryan, Commented Mar 31, 2022 at 16:08
Does this answer your question? Factors for determining the degree of parallelism for the ForEachAsync — Theodor Zoulias
– Theodor Zoulias, Commented Sep 19, 2023 at 11:21

tmaj · Accepted Answer · 2022-03-31 00:18:38Z

IMHO The only way to get the number is...testing.

For http work there are two parties involved:

you code
the remote side that does the work for you.

Your fast may be too fast for the remote side. This can because of resources and/or throttling.

Note on the default

The default - which results in ProcessorCount - will depend on the machine that the code runs on and if you run your code in the cloud this number may be different than what's on your beefy laptop.

This can lead to unexpected differences between non-prod and prod environments.

GitHub specific

gitHub.com has a 5,000 requests per hour for non-enterprise users (from here) and there is also this:

In order to provide quality service on GitHub, additional rate limits may apply to some actions when using the API. For example, using the API to rapidly create content, poll aggressively instead of using webhooks, make multiple concurrent requests, or repeatedly request data that is computationally expensive may result in secondary rate limiting.

In Best practices for integrators we can read

Dealing with secondary rate limits

Secondary rate limits are another way we ensure the API's availability. To avoid hitting this limit, you should ensure your application follows the guidelines below.

...

Make requests for a single user or client ID serially. Do not make requests for a single user or client ID concurrently.

I should have pointed out that calling GitHub was only an example. In my case, I'm building microservices where both services are owned by me. The question is really about the client side implications of using Parallel.ForEachAsync.

Collectives™ on Stack Overflow

How do I determine an appropriate value for MaxDegreeOfParallelism when using Parallel.ForEachAsync

1 Answer 1

Note on the default

GitHub specific

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Note on the default

GitHub specific

1 Comment

Linked

Related