3

I'm trying to parallelize the following function (pseudocode):

vector<int32> out; for (int32 i = 0; i < 10; ++i) { int32 result = multiplyStuffByTwo(i); // Push to results out.push_back(result); } 

When I now parallelize the for loop and define the push_back part as a critical path, I'm encountering the problem that (of course) the order of the results in out is not always right. How can I make the threads run execute the code in the right order in the last line of the for loop? Thanks!

2 Answers 2

8

You can set the size of the out-vector by calling out.resize() and then set the value by index, not by push_back()

Pseudo-code:

vector<int32> out; out.resize(10); for (int32 i = 0; i < 10; ++i) { int32 result = multiplyStuffByTwo(i); // set the result out[i] = result; } 

But, I'd recommend using "classic" arrays. They're much faster and not really harder to manage

Sign up to request clarification or add additional context in comments.

6 Comments

Thank you! I will try that today - when using vector, do I need to set a thread lock (critical area) on the write process?
Actually, in general the vector class is not thread-safe. Thats why push_back() is very problematic. But in this case I don't expect any problems. Once again: Better use a normal array for such simple operations (and if you don't need a resizable container)
Why would you expect "classic" arrays to be any faster, once you've moved the resize out of the loop? They should be identical.
You are right, I thought, operator[] would throw an exception when the index out of range. But only at() throws. So, there actually shouldnt be any performance issue here.
So the new problem is that the loop becomes slower now as (I think) with every write in out, the other core's caches are invalidated. I could think of using a private out for every thread and then interleaving it, but OpenMP only allows reduction with primitive operations like */+/-, but no assignment. Does anyone have a hint about that? Thanks!
|
1
vector<int32> out; #pragma omp parallel for ordered for (int32 i = 0; i < 10; ++i) { int32 result = multiplyStuffByTwo(i); // this will be run in parallel #pragma omp ordered // Push to results out.push_back(result); // this will be run sequential } 

This can be helpful:

http://openmp.org/mp-documents/omp-hands-on-SC08.pdf

3 Comments

-1: If you add that pragma you lose the parallelisation, which was the whole point.
Thanks, I also checked that before and the execution times goes up because of the thread handling necessary.
@Jørgen Fogh: right, I corrected the snippet; now, only vector operation is serial, and multiplyStuffByTwo is done in parallel. If this operation would last long, you should gain here.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.