4

I expect there are many possible solutions to this question, I can come up a few myself, some clearly better than others but none that I am certain are optimal so I'm interested in hearing from you real multi threading gurus out there.

I have circa 100 pieces of work that can be executed concurrently as there are no dependencies between them. If I execute these sequentially my total execution time is approx 1:30s. If I queue each piece of work in the thread pool it takes approx 2m, which suggests to me that I am trying to do too much at once and context switching between all these threads is negating the advantage of having those threads.

So based on the assumption (please feel free to shoot me down if this is wrong) that if I only queue up to the number of cores in my system (8 on this machine) pieces of work at any one time I will reduce context switching and thus improve overall efficiency (other process threads not withstanding of course), can anyone suggest the optimal pattern/technique for doing this?

BTW I am using smartthreadpool.codeplex.com, but I don't have to.

6
  • I don't know the 'smart' threadpool but it should try to keep the #thread low. How does the std ThreadPool perform? Commented Dec 8, 2011 at 11:10
  • SmartThreadPool eh? Why you'd want to use this over the .net ThreadPool is a complete mystery. Can you enlighten me? Commented Dec 8, 2011 at 11:17
  • It has some nice syntax sugar and I have used it before, no other reason. Have never compared/researched how it holds up against .NET native implementation so am in process of swapping it out now.. Commented Dec 8, 2011 at 11:21
  • I remember SmartThreadPool from a good few years ago, and remember it as being very good. How it and the framework's have kept pace with each other and for which situations, I couldn't say though. Commented Dec 8, 2011 at 11:24
  • 1
    And moving up a framework version isn't always impossible, so I think answers along 4.0 lines are worth including in the mix, alongside those that stay with 3.5. If nothing else, they could help other people apart from the OP). Commented Dec 8, 2011 at 11:48

2 Answers 2

5

A good threadpool already tries to have one active thread per available core. This isn't a matter of having one thread for work per core though, as if a thread is blocking (most classically on I/O) you want another thread using that core.

Trying the .NET threadpool instead might be worth a try, or the Parallel class.

If your CPU is hyper-threaded (8 virtual cores on 4 physical) this could be an issue. On average hypter-threading makes things faster, but there are plenty of cases where it makes them worse. Try setting affinity to every other core and see if it gives you an improvement - if it does, then this is likely a case where hyper-threading is bad.

Do you have to gather results together again, or share any resources between the different tasks? The cost of doing this could well be greater than the savings of multi-threading. Perhaps they are so unnecessarily though - e.g. if you are locking on shared data but that data is only ever read, you don't actually need to read with most data-structures (most but not all are safe for concurrent reads if there are no writes).

The partitioning of the work could be an issue too. Say the single-threaded approach works its way through an area of memory, but the multi-threaded approach gives each thread its next bit of memory to work with round-robin. Here there'd be more cache-flushing per core as the "good next bit" is actually being used by another core. In this situation, splitting work into bigger chunks can fix it.

There are plenty of other factors that can make a multi-threaded approach perform worse than a single-threaded, but those are a few I can think of immediately.

Edit: If you are writing to a shared store, it could be worth trying a run where you just throw away any results. That could narrow down whether that's where the issue lies.

Sign up to request clarification or add additional context in comments.

Comments

1

To me what you are saying seems strange. Because by definition a thread pool is supposed to not use more than the available resources the system has (i.e. if you have 4 cores, it will use 4 threads or something close to this number). It uses a queue from which the worker threads take tasks and execute them. Therefore, you cannot truly have system oversubscription if you use a thread pool, unless you are specifying manually the number of threads to use, which in your case is not recommended.

Have you tried using the standard C# ThreadPool class instead?

3 Comments

Threadpools don't always do a perfect job of having one thread per core on the go at all times, since that would normally want more than 1 thread per core to have work done while threads are blocking, but how much more isn't always easy to predict. I agree they should try the standard pool for comparison though.
@Jon Hanna: I agree, the number is not always perfect, but to reach a slower than sequential execution time is still strange.
That does all make sense, and was my initial assumption, but doesn't explain why overall the work takes longer when executed concurrently. I will swap out the smart thread pool and I will also mock each piece of work to something compute bound rather than the real work which is IO bound and may be interfering with my results.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.