33

In the following example the C++11 threads take about 50 seconds to execute, but the OMP threads only 5 seconds. Any ideas why? (I can assure you it still holds true if you are doing real work instead of doNothing, or if you do it in a different order, etc.) I'm on a 16 core machine, too.

#include <iostream> #include <omp.h> #include <chrono> #include <vector> #include <thread> using namespace std; void doNothing() {} int run(int algorithmToRun) { auto startTime = std::chrono::system_clock::now(); for(int j=1; j<100000; ++j) { if(algorithmToRun == 1) { vector<thread> threads; for(int i=0; i<16; i++) { threads.push_back(thread(doNothing)); } for(auto& thread : threads) thread.join(); } else if(algorithmToRun == 2) { #pragma omp parallel for num_threads(16) for(unsigned i=0; i<16; i++) { doNothing(); } } } auto endTime = std::chrono::system_clock::now(); std::chrono::duration<double> elapsed_seconds = endTime - startTime; return elapsed_seconds.count(); } int main() { int cppt = run(1); int ompt = run(2); cout<<cppt<<endl; cout<<ompt<<endl; return 0; } 
6
  • 1
    My guess is that OpenMP is smart enough to optimize out the whole loop since it's a NOP. With threads you're suffering the overhead of spinning up and tearing down all those NOP threads. Try adding some actual code to the test function and see what happens. Commented Apr 24, 2014 at 1:16
  • Well, one thing is that you're using a dynamically resizing container to hold the threads; that can't help with performance. Commented Apr 24, 2014 at 1:17
  • Try just using a fixed sized array and initiating all its elements when created. Commented Apr 24, 2014 at 1:17
  • @aruisdante: I have added real code, and I can assure you the difference persists (I had lots of code and factored it down to post on here)--it's not due to the NOP. Commented Apr 24, 2014 at 1:18
  • @CoffeeandCode: I've done that (and just tried again), and the difference is negligible, as the call to thread() calls new anyway. Good point though--But I also can assure you that that does not affect the performance. Commented Apr 24, 2014 at 1:22

2 Answers 2

42

OpenMP thread-pools for its Pragmas (also here and here). Spinning up and tearing down threads is expensive. OpenMP avoids this overhead, so all it's doing is the actual work and the minimal shared-memory shuttling of the execution state. In your Threads code you are spinning up and tearing down a new set of 16 threads every iteration.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. This has to be the answer, but wouldn't you almost think that it would require a #pragma around the outer for loop? Also, how do you know that as a fact--I don't see information about it in the documentation, even in the linked site. I'm sure that you're right, I just want to back up the information. I haven't ever read, as a fact, that they do that.
Check out the second link, there's some talk in there. I can try and find more solid documentation, I know I've read it explicitly somewhere.
here is another discussion about it. Basically, it's actually not defined by the OpenMP standard, but most implementations on most platforms seem to do it if it's more efficient.
Thanks again :). I figured it had to be threadpools, but I just surprisingly couldn't find it stated anywhere. After looking some more, I found this. I'm going to try it on a non-intel machine and see if it still holds true. You beat me to it--It does look like it's basically done in all implementations.
PS. I can confirm the difference also exists on AMD machines.
2

I tried a code of an 100 looping at Choosing the right threading framework and it took OpenMP 0.0727, Intel TBB 0.6759 and C++ thread library 0.5962 mili-seconds.

I also applied what AruisDante suggested;

void nested_loop(int max_i, int band) { for (int i = 0; i < max_i; i++) { doNothing(band); } } ... else if (algorithmToRun == 5) { thread bristle(nested_loop, max_i, band); bristle.join(); } 

This code looks like taking less time than your original C++ 11 thread section.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.