13

UPDATE: I used Threading to split up the Loop in the Amount of Kernels (8 in my Case) and the complete Loop went through in under 1 second. So the problem is not, that the Operation is not faster with threading. Why did Parralel Extension fail in this case?

Hey everyone. I want to convert my ForEach with Parrallel.Foreach. The problem is, that the parralelisation brings hardly any advantage for me.

Original:

foreach (Entities.Buchung buchung in buchungen) { Int32 categoryID = manager.GetCategoryID(new Regelengine.Booking(buchung)); // Average 4ms buchung.Category = categoryID.ToString(); } 

Parallel:

System.Threading.Tasks.Parallel.ForEach(buchungen, buchung => { Int32 categoryID = manager.GetCategoryID(new Regelengine.Booking(buchung)); buchung.Category = categoryID.ToString(); }); 

Results:

--------------------------- Stopwatched Results for 1550 entries in the List: --------------------------- Parallel.Foreach 00:00:07.6599066 Average Foreach: 00:00:07.9791303 

Maybe the problem is, that the actual action in the loop is so short? But nobody can tell me, that parallelising 1550 operations on an Intel I7 won't save any time.

4
  • There is probably a lock in that Regelengine thing. Commented Feb 4, 2011 at 16:01
  • 2
    The question is: Is the method inside the statement profiting from parallelism? The next thing I don't know, is what GetCategoryID does. Is there a database call which might be the bottleneck and prevent the code from using multithreading. Commented Feb 4, 2011 at 16:02
  • 2
    What happens in the method manager.GetCategoryID ? What happens in the ctor new Regelengine.Booking ? Commented Feb 4, 2011 at 16:02
  • There are no Database or network calls. The Constructor converts my Entity to the Entity used for the call to manager.GetcategoryID() which runs on a .COM library Commented Feb 4, 2011 at 16:06

5 Answers 5

10

There is only one resource you can take advantage of by using Parallel.For: CPU cycles. When you have N cores then you can theoretically speed up your code by a factor of N. What is however required is that it is actually CPU cycles that is the constraint in your code. Which is not often the case unless you execute computationally expensive code. Other constraints are the speed of the hard disk, the network connection, a dbase server, in select cases the bandwidth of the memory bus. You've only got one of those, Parallel.For cannot magically give you another disk.

Testing whether Parallel.For will speed up your code is pretty simple. Just run the code without parallelizing and observe the CPU load in Taskmgr.exe or Perfmon. If one core isn't running at 100% then your code is not compute bound. If it is running at, say, 10% then you can only ever hope to make it take 90% of the time no matter how many cores you have. Which you'll get by overlapping I/O wait time with processing time, two threads will get that done.

Sign up to request clarification or add additional context in comments.

Comments

5

Questions that you should consider in this are:

  • What is the overhead of spinning up a thread?
  • What is the overhead of my thread safety (locks)?
  • Where are the actual bottlenecks and will multithreading really help?

The last is your biggest consideration here. For example, if you are maxing your i/o channel, all the threads in the world won't do squat. So is your task CPU bound or I/O bound?

2 Comments

Thread creation shouldn't be expensive enough to create this effect. And saying locks with saying where isn't useful.
Thanks for your answer, but I used normal Threads now to split up the loop manually and it went in no time. So the problem is not in threading itselve, but in ForeachLoop?
1

I think you're right, it does look a little too short to be worth using parallel foreach. I use parallel foreach only when I know that there is going to be something important happening in the foreach that will take time, or could take time, like a database connection or if I'm sending a lot of data to a web service. If it just processing information on the server then, like just getting IDs from a collection that has already been loaded into memory then it really wouldn't be worth it.

Comments

1

Parallelism won't be faster if you don't have available cores to use. So when I see code like this my first thought is that you have other threads running.

It could also be the workload. The synchronizing logic isn't free and each iteration doesn't do much. Consider looking at the other overloads of Parallel.ForEach for options you can tweak.

Also try using Parallel.For. You cannot read from an IEnumerable in a parallel fashion but you can from an IList using indexes.

Comments

0

First off, 1550 isn’t much. For example, sorting an array of this many elements will usually be faster when done sequentially than when done in parallel. It all depends on the operation.

Secondly, what does GetCategoryID do? Does it use locks? For that matter, does the Regelengine.Booking constructor?

7 seconds total running time indicates that the operation is slow enough that it should benefit from parallelization. On the other hand, your code seems to indicate that actually not a lot of processing is going on here. You’re most probably loading data from disk or from a database. In both cases, that’s a bottleneck that parallelization can do (almost) nothing against. Concurrent processing makes your code faster only if it’s compute bound.

But you’ve not given enough information to determine that.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.