0

I have the following C# code :

var rand = new Random(1); var range = Enumerable.Range(1, 8); var partition = Partitioner.Create(range, EnumerablePartitionerOptions.NoBuffering); foreach (var x in partition .AsParallel() .AsOrdered() .WithMergeOptions(ParallelMergeOptions.NotBuffered) .WithDegreeOfParallelism(4) .Select(DoSomething)) { Console.WriteLine($"---- {x} {DateTime.Now.TimeOfDay} " + $"{Thread.CurrentThread.ManagedThreadId}"); } int DoSomething(int x) { Console.WriteLine($"WAIT {x} {DateTime.Now.TimeOfDay} " + $"{Thread.CurrentThread.ManagedThreadId}"); int random; lock (rand) { random = rand.Next(2000); } Thread.Sleep(random); //fake work being done Console.WriteLine($"DONE {x} {DateTime.Now.TimeOfDay} " + $"{Thread.CurrentThread.ManagedThreadId}"); return x; } 

Here is the output :

WAIT 2 13:51:08.8398170 10 WAIT 1 13:51:08.8398197 9 WAIT 3 13:51:08.8398132 11 WAIT 4 13:51:08.8398108 4 DONE 1 13:51:09.0805471 9 <-- start WAIT 5 13:51:09.0808715 9 DONE 2 13:51:09.3504889 10 WAIT 6 13:51:09.3505787 10 DONE 3 13:51:09.7937364 11 WAIT 7 13:51:09.7939256 11 DONE 6 13:51:10.2208844 10 WAIT 8 13:51:10.2209660 10 DONE 4 13:51:10.3948195 4 ---- 1 13:51:10.3951458 2 <-- end DONE 5 13:51:10.4109264 9 ---- 2 13:51:10.4112009 2 ---- 3 13:51:10.4112443 2 DONE 7 13:51:10.5068458 11 ---- 4 13:51:10.5068961 2 ---- 5 13:51:10.5071686 2 ---- 6 13:51:10.5072167 2 DONE 8 13:51:12.1163565 10 ---- 7 13:51:12.1164506 2 ---- 8 13:51:12.1165087 2 

As you can see, there is 1.3 seconds that elapsed between the moment item 1 is being processed and the moment it get printed out to console.

Because I explicitly asked for no buffering (neither in the partitioning and in the processing of the results), I would have expected the item 1 to be printed out in the foreach loop as soon as that item is processed (which is when it left DoSomething method). Is there some options I'm missing or is this expected behavior ?

12
  • 2
    What are you trying to do? Your own code blocks the worker threads, which means you aren't processing 4 concurrent calls at a time as you assumed. That Thread.Sleep(random); //fake work being done doesn't fake work, it wastes the ThreadPool thread. When you call Thread.Sleep the thread is evicted immediately and will have to be scheduled for execution whenever the OS decides after it wakes up. Commented Nov 20 at 12:46
  • If you want to emulate work use Thread.SpinWait. That puts the thread into a tight loop, so while it's busy, it doesn't get evicted. The docs explain the difference : A thread that calls Sleep yields the rest of its current slice of processor time, even if the specified interval is zero. Specifying a non-zero interval for Sleep removes the thread from consideration by the thread scheduler until the time interval has elapsed. Commented Nov 20 at 12:51
  • I indeed block the worker threads but there is nothing that prevent or block the main thread to output things to console (inside the foreach loop) . I have edited the OP to show the managed thread id. As you can see it's a different thread from the worker threads. I expected the main thread / foreach loop output things as soon as items are processed (one by one). Commented Nov 20 at 12:55
  • there is nothing that prevent or block the main thread to output things to console wrong. That thread is used to process data itself. I suspect you misunderstand how PLINQ or even Parallel.ForEach work. Both are blocking operations. Besides, that foreach is sequential and blocking, which means it can only proceed when the query produces output. And the query can't proceed because all threads are evicted Commented Nov 20 at 12:57
  • BTW a PLINQ Select that doesn't select is a bug in itself. Parallel LINQ isn't a way of running things in parallel. It's built to process lots of data in parallel. It's specifically built to parallelize the entire query, not just the last, supposedly cheapest, part. It's built to create a pipeline of operators and use operators optimized for parallelism for all of them. Partitioning is used to reduce the need for synchronization between workers. The iteration is where the results are finally collected. By using AsOrdered() though you're forcing extra sync and buffering at the end Commented Nov 20 at 13:01

1 Answer 1

4

The PLINQ engine uses threads from the ThreadPool and the current thread as worker threads. So when you specify WithDegreeOfParallelism(4), PLINQ invokes the Select delegate on 3 threads from the ThreadPool and on the current thread. The current thread is not standing idle waiting for other threads to do all the work. It's busy doing work. Now when you consume the results by enumerating the ParallelQuery<TResult>, the enumeration will happen exclusively on the current thread. So the current thread becomes overloaded, having to participate in the parallel processing of the Select delegate, and perform the enumeration of the results all alone. In your example what happens is that when a ThreadPool thread completes the processing of some item, the current thread is not immediately available to consume the result, because it's busy processing some other item.

As you see the PLINQ is not a clean producer/consumer paradigm. There is no clear separation between the roles of the producer and the consumer. The consequence is that the results of the parallel processing are not available immediately to consumers. There is latency in the system, that you can reduce with the .WithMergeOptions(ParallelMergeOptions.NotBuffered) but not eliminate entirely.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.