OpenMP Ordered Parallelization

Question

I'm trying to parallelize the following function (pseudocode):

vector<int32> out; for (int32 i = 0; i < 10; ++i) { int32 result = multiplyStuffByTwo(i); // Push to results out.push_back(result); }

When I now parallelize the for loop and define the push_back part as a critical path, I'm encountering the problem that (of course) the order of the results in out is not always right. How can I make the threads run execute the code in the right order in the last line of the for loop? Thanks!

the_nic · Accepted Answer · 2011-08-08 19:39:49Z

8

You can set the size of the out-vector by calling out.resize() and then set the value by index, not by push_back()

Pseudo-code:

vector<int32> out; out.resize(10); for (int32 i = 0; i < 10; ++i) { int32 result = multiplyStuffByTwo(i); // set the result out[i] = result; }

But, I'd recommend using "classic" arrays. They're much faster and not really harder to manage

answered Aug 8, 2011 at 19:39

the_nic

2901 silver badge4 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

vls Over a year ago

Thank you! I will try that today - when using vector, do I need to set a thread lock (critical area) on the write process?

the_nic Over a year ago

Actually, in general the vector class is not thread-safe. Thats why push_back() is very problematic. But in this case I don't expect any problems. Once again: Better use a normal array for such simple operations (and if you don't need a resizable container)

Drew Hall Over a year ago

Why would you expect "classic" arrays to be any faster, once you've moved the resize out of the loop? They should be identical.

the_nic Over a year ago

You are right, I thought, operator[] would throw an exception when the index out of range. But only at() throws. So, there actually shouldnt be any performance issue here.

vls Over a year ago

So the new problem is that the loop becomes slower now as (I think) with every write in out, the other core's caches are invalidated. I could think of using a private out for every thread and then interleaving it, but OpenMP only allows reduction with primitive operations like */+/-, but no assignment. Does anyone have a hint about that? Thanks!

|

Jakub M. · Accepted Answer · 2011-08-09 10:43:14Z

1

vector<int32> out; #pragma omp parallel for ordered for (int32 i = 0; i < 10; ++i) { int32 result = multiplyStuffByTwo(i); // this will be run in parallel #pragma omp ordered // Push to results out.push_back(result); // this will be run sequential }

This can be helpful:

http://openmp.org/mp-documents/omp-hands-on-SC08.pdf

edited Aug 9, 2011 at 10:43

answered Aug 8, 2011 at 19:43

Jakub M.

34.1k48 gold badges117 silver badges184 bronze badges

3 Comments

Jørgen Fogh Over a year ago

-1: If you add that pragma you lose the parallelisation, which was the whole point.

vls Over a year ago

Thanks, I also checked that before and the execution times goes up because of the thread handling necessary.

Jakub M. Over a year ago

@Jørgen Fogh: right, I corrected the snippet; now, only vector operation is serial, and multiplyStuffByTwo is done in parallel. If this operation would last long, you should gain here.

Collectives™ on Stack Overflow

OpenMP Ordered Parallelization

2 Answers 2

6 Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

3 Comments

Related