How to properly parallelize worker tasks?

Question

Consider the following code snippet and notice the difference in total runtime between setting numberTasksToSpinOff equal to 1 and then 3,4, or more (depending on thread resources on your machine). I notice much longer run times when spinning off more tasks.

I passed on purpose a data collection into each worker instance that each worker tasks reads from at the same time. I thought that tasks can access a shared data structure without blocking as long as those operations are only reads or enumerations.

My goal is to spin off multiple tasks that iterate over the same shared data structure via read operations and complete altogether at around the same time regardless of number tasks spun off.

Edit: Please see second code snippet where I implement Parallel.Foreach() and create each worker's own dataset, hence no accessing identical data structures by different tasks/threads. Yet I still see an unacceptable amount of overhead.

class Program { static void Main(string[] args) { Console.WriteLine($"Entry Main Function Thread Id: {Thread.CurrentThread.ManagedThreadId}"); //run var task = Task.Run(async () => { Console.WriteLine($"Entry RunMe Task Thread Id: {Thread.CurrentThread.ManagedThreadId}"); await RunMe(); Console.WriteLine($"Exit RunMe Task Thread Id: {Thread.CurrentThread.ManagedThreadId}"); }); task.Wait(); Console.WriteLine($"Exit Main Function Thread Id: {Thread.CurrentThread.ManagedThreadId}"); Console.WriteLine("Press key to quit"); Console.ReadLine(); } private static async Task RunMe() { var watch = new Stopwatch(); var numberTasksToSpinOff = 6; var numberItems = 20000; var random = new Random((int)DateTime.Now.Ticks); var dataPoints = Enumerable.Range(1, numberItems).Select(x => random.NextDouble()).ToList(); var tasks = new List<Task>(); var workers = new List<Worker>(); //structure workers for (int i = 1; i <= numberTasksToSpinOff; i++) { workers.Add(new Worker(i, dataPoints)); } //start timer watch.Restart(); //spin off tasks foreach (var worker in workers) { tasks.Add(Task.Run(() => { Console.WriteLine($"Entry WorkerId: {worker.WorkerId} -> New Tasks spun off with in Thread Id: {Thread.CurrentThread.ManagedThreadId}"); worker.DoSomeWork(); Console.WriteLine($"Exit WorkerId: {worker.WorkerId} -> New Tasks spun off with in Thread Id: {Thread.CurrentThread.ManagedThreadId}"); })); } //completion tasks await Task.WhenAll(tasks); //stop timer watch.Stop(); Console.WriteLine($"Time it took to complete in Milliseconds: {watch.ElapsedMilliseconds}"); } } public class Worker { public int WorkerId { get; set; } private List<double> _data; public Worker(int workerId, List<double> data) { WorkerId = workerId; _data = data; } public void DoSomeWork() { var indexPos = 0; foreach (var dp in _data) { var subSet = _data.Skip(indexPos).Take(_data.Count - indexPos).ToList(); indexPos++; } } }

Second Code Snippet:

class Program { static void Main(string[] args) { var watch = new Stopwatch(); var numberTasksToSpinOff = 1; var numberItems = 20000; //var random = new Random((int)DateTime.Now.Ticks); //var dataPoints = Enumerable.Range(1, numberItems).Select(x => random.NextDouble()).ToList(); var workers = new List<Worker>(); //structure workers for (int i = 1; i <= numberTasksToSpinOff; i++) { workers.Add(new Worker(i)); } //start timer watch.Restart(); //parellel work if (workers.Any()) { var processorCount = Environment.ProcessorCount; var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = processorCount }; Parallel.ForEach(workers, parallelOptions, DoSomeWork); } //stop timer watch.Stop(); Console.WriteLine($"Time it took to complete in Milliseconds: {watch.ElapsedMilliseconds}"); Console.WriteLine("Press key to quit"); Console.ReadLine(); } private static void DoSomeWork(Worker worker) { Console.WriteLine($"WorkerId: {worker.WorkerId} -> New Tasks spun off with in Thread Id: {Thread.CurrentThread.ManagedThreadId}"); var indexPos = 0; foreach (var dp in worker.Data) { var subSet = worker.Data.Skip(indexPos).Take(worker.Data.Count - indexPos).ToList(); indexPos++; } } } public class Worker { public int WorkerId { get; set; } public List<double> Data { get; set; } public Worker(int workerId) { WorkerId = workerId; var numberItems = 20000; var random = new Random((int)DateTime.Now.Ticks); Data = Enumerable.Range(1, numberItems).Select(x => random.NextDouble()).ToList(); } }

Did you do anything at all to debug this yet? There's no evidence in your question that you've made any attempt at all to explain your observations. Use a profiler, figure out where all the time is being spent. I'll bet you find it's the garbage collector, because the main thing your tasks do is create a lot of garbage. — Peter Duniho
– Peter Duniho, Commented Feb 6, 2018 at 7:38
@PeterDuniho, yes I have debugged the code, I even printed out thread IDs in-code. And you are incorrect, even when I just iterate the data structure without creating any additional data I see the same explosion in total execution time. I find your "close vote" going overboard in this case without first clarifying. If I had figured it all out I would not have posted a question. I thought this is exactly what this site is for. Not everyone may possess the same knowledge at this point in time as you do, hence the asking for advice. — Matt
– Matt, Commented Feb 6, 2018 at 7:44
This seems like a reasonable question with some reasonable debate to me..not sure why its being closed, especially if a solution might be found? — Mark Redman
– Mark Redman, Commented Feb 6, 2018 at 7:47

Christoph Fink · Accepted Answer · 2018-02-06 07:26:49Z

NOTE: The following answer is based on testing and observation and not definitiv knowledge.

The more task you spin off the more overhead you generate and thus the total execution time also rises. BUT if you think of it from another viewpoint you will see that the actually processed "data-points" will increase the more tasks you spin up (up until you reach the limit of available hardware-threads):

The following values are generated on my machine (4C/8T) with 10000 points per list:

1 worker -> 1891 ms -> 5288 p/s
2 worker -> 1921 ms -> 10411 p/s
4 worker -> 2670 ms -> 14981 p/s
8 worker -> 4871 ms -> 16423 p/s
12 worker -> 7449 ms -> 16109 p/s

There you see until I reach my "core-limit" the processed data increases significantly, then until I reach my "thread-limit" it increases still noticeable, but after that it decreases again, because of the risen overhead and no more available hardware-resources.

I agree with all your observations above, but as long as the number hardware threads do not constrain the number workers that are run in parallel your above reasoning does not hold. I run on a dual Xeon machine with 24 cores and 48 hyper threads. Hardware specs in this case definitely do not constrain a job with 10 workers, yet total execution time still ends up being a multiple of a job with 1 or 2 workers. Spinning off more tasks/threads costs a little but not that much. Something else seems to be going on.
@MattWolf I did run the code through my profile and all the increased time does come from the ToList call. It gets much slower with more threads in parallel...
...because it allocates space that is then garbage collected as @Peter Duniho hinted at? I am profiling right now, using GC.TryStartNoGCRegion and GC.EndNoGCRegion
I am still stuck on this, preventing the garbage collector to work in the critical region did not yield any benefits at all. But obviously some other resource management by CLR is still interfering, possibly memory allocation for the created objects?
@MattWolf Does your production-code also use ToList? Maybe you can avoid using that to speed it up?

Mark Redman · Accepted Answer · 2018-02-06 07:04:27Z

3

Have you had a look at Parallel Tasks? You could then do something like this.

eg:

if (workers.Any()) { var parallelOptions = new ParallelOptions {MaxDegreeOfParallelism = Environment.ProcessorCount}; Parallel.ForEach(workers, parallelOptions, DoSomeWork); } private static void DoSomeWork(Worker worker) { }

answered Feb 6, 2018 at 7:04

Mark Redman

24.6k20 gold badges99 silver badges152 bronze badges

24 Comments

TheGeneral Over a year ago

Caveat.. Ideally if the workloads aren't IO bound operations, I.e things that use IO Completion ports.

Ferus7 Over a year ago

As far as I know, this becomes useful when there is a big work load for each operation, right?

Mark Redman Over a year ago

I have used this specifically for long running file IO tasks and works well.

Matt Over a year ago

@MarkRedman, I will check it out but what causes the additional workload in my original code? Am I using async/await or tasks incorrectly? I am asking because later on I want to be able to add new workers/tasks dynamically during runtime without knowing beforehand how many workers/tasks I have at hand at any given point in time.

Mark Redman Over a year ago

The way I understand it, using async/away as you are with WhenAll, is making use of the resources on same thread, whereas Parallel will do the work in individual threads.

|

Collectives™ on Stack Overflow

How to properly parallelize worker tasks?

2 Answers 2

6 Comments

24 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

24 Comments

Linked

Related