Why would executing a function in parallel significantly slowdown the program?

Question

I am trying to parallelize a code using OpenMP, the serial time for my current input size is around 9 seconds, I have a code of the following form:

int main() { /* do some stuff*/ myfunction(); } void myfunction() { for (int i=0; i<n; i++) { //it has some parameters but that is beyond the point I guess int rand = custom_random_generator(); compute(rand); } }

so here the random generator can be executed in parallel since there are no dependencies, and the same goes for the compute function so I was attempting to parallel this piece but all my attempts resulted in a failure, the first thought was to put these functions as task so they get executed in parallel but resulted in a slower result, here is what I did

void myfunction() { for (int i=0; i<n; i++) { #pragma omp task { //it has some parameters but that is beyond the point I guess int rand=custom_random_generator(); compute(rand); } } }

Result: 23 seconds, more than double the serial time

Putting task on compute() only resulted in the same

Even worse attempt:

void myfunction() { #pragma omp parallel for for (int i=0; i<n; i++) { //it has some parameters but that is beyond the point I guess int rand=custom_random_generator(); compute(rand); } }

Result: 45 seconds

Theoretically speaking, why could this happen? I know that for anyone to tell my exact problem they would need a minimum reproducible example but my goal from the question is to understand the different theories that could explain my problem and apply them myself, why would parallelizing an "embarrassingly parallel" piece of code result in way worse performance?

Something else is causing all this overhead cannot only be the thread creation, I would suspect false sharing, cache invalidation and so on, how many cores do you have? — dreamcrash
– dreamcrash, Commented May 1, 2021 at 9:52
I have a macbook air 2019 with intel core i5 so I have 2 cores but run 4 threads in parallel as it supports hyperthreading @dreamcrash — Sergio
– Sergio, Commented May 1, 2021 at 9:54
Can you tell me the times with #pragma omp parallel for num_threads(1), then num_threads(2) ? — dreamcrash
– dreamcrash, Commented May 1, 2021 at 9:55
No problem, look for race conditions shared states this include external function calls — dreamcrash
– dreamcrash, Commented May 1, 2021 at 10:53
Your task code has no parallelism! (There is no parallel directive anywhere in the code). You may also want to look at papers on parallel random number generation, it is not a trivial task. (E.g. Parallel Random Numbers: As Easy as 1, 2, 3 - The Salmons thesalmons.org/john/random123/papers/random123sc11.pdf ) — Jim Cownie
– Jim Cownie, Commented May 3, 2021 at 8:28

anastaciu · Accepted Answer · 2021-05-01 09:24:17Z

1

One theory could be the overhead that is associated with creating and maintaining multiple threads.

The advantges of parallel programming can only be seen when each iteration has to perform more complicated processor intensive tasks.

A simple for loop with some simple routine inside would not take advantage of it.

answered May 1, 2021 at 9:24

anastaciu

24.1k7 gold badges37 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Sergio Over a year ago

my random generator is pretty complex but my compute function just returns the result of a simple equation, I think I will try to only parallelize it then

Sergio Over a year ago

task failed successfully! It improved the time to 18 seconds but it is still 2 times worse than the serial but I guess that confirms that task creating is creating overhead

dreamcrash Over a year ago

Something else is causing all this overhead cannot only be the thread creation, I would suspect false sharing, cache invalidation and so on

anastaciu Over a year ago

@Sergio 🤣 great throwback, windows still has some gems though, check the return for winAPI successful function calls.

dreamcrash Over a year ago

:D in all fairness I think the problem is that OP example does not accurate represent the code where the parallelizations, hence one can only speculate

|

Collectives™ on Stack Overflow

Why would executing a function in parallel significantly slowdown the program?

1 Answer 1

6 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Related