openmp: increasing of threads number decreases perfomance

Question

I have this C++ code.

Loop goes throgh the matrix, finds the min element in each row and subtracts it from each element of corresponding row. Variable myr is a summ of all min elements

Trying to parallel for:

int min = 0; int myr = 0; int temp[SIZE][SIZE]; int size = 0; ...//some initialization omp_set_num_threads(1); start_time = omp_get_wtime(); #ifdef _OPENMP #pragma omp parallel for firstprivate(min, size) reduction(+:myr) #endif for(int i = 0; i < size; i++){ min = INFINITY; for(int j = 0; j < size; j++){ if (temp[i][j] < min) min = temp[i][j]; } myr+=min; for(int j = 0; j < size; j++) temp[i][j]-=min; } end_time = omp_get_wtime();

if I set omp_set_num_threads(2); this part of code starts working slower.

My proc has 2 cores

Why code works slower with 2 threads?

First of all, OMP doesn't mean that automagically you get increased speed. Second thing, probably the conditional branch acts as a barrier, so the overhead is bigger. — linello
– linello, Commented Sep 10, 2012 at 13:19
The ultimate question is: is your algorithm suitable for data parrallellism? Can thread A run an iteration of your outer for loop and thread B another iteration of the outer loop without them having to wait on each other? — Tony The Lion
– Tony The Lion, Commented Sep 10, 2012 at 13:20
From first looks, it cannot. So your adding a thread is futile. — Tony The Lion
– Tony The Lion, Commented Sep 10, 2012 at 13:21
@Tony The Lion: Why do you say it cannot? The only part where they clash is the reduction variable, which is one addition done at the end. — Tudor
– Tudor, Commented Sep 10, 2012 at 13:22
Ive seen lots of questions on the multithreading tag where people just assume that more threads equals better performance. Maybe we should create a wiki or something explaining why its not always the case. — Brady
– Brady, Commented Sep 10, 2012 at 13:22

Jens Gustedt · Accepted Answer · 2012-09-10 13:36:53Z

There must be some aliasing or something going on. Make things simpler for OpenMP:

int const size0 = size; #ifdef _OPENMP #pragma omp parallel for reduction(+:myr) #endif for(int i = 0; i < size0; i++){ int min = INFINITY; int * tmp = temp[i]; for(int j = 0; j < size0; j++){ if (tmp[j] < min) min = tmp[j]; } for(int j = 0; j < size0; j++) tmp[j]-=min; myr+=min; }

That is, have most of the variables local and const if you may.

Victor K · Accepted Answer · 2012-09-11 18:54:21Z

The parallel part can be reinterpreted as follows (I have used the snippet by @jens-gustedt, but to my experience it didn't make much difference):

#pragma omp parallel private(myr_private) shared(myr) { myr_private = 0; #pragma omp for for(int i = 0; i < size; i++){ int min = INFINITY; int * tmp = temp[i]; for(int j = 0; j < size; j++){ if (tmp[j] < min) min = tmp[j]; } for(int j = 0; j < size; j++) tmp[j]-=min; myr_private+=min; } #pragma omp critical { myr+=myr_private; } }

(This interpretation is straight from http://www.openmp.org/mp-documents/OpenMP3.1.pdf Example A.36.2c). If number of threads is n>1, there is overhead when #pragma omp parallel creates additional thread(s) and then in critical section, which all of the threads should wait for.

I have experimented with different matrix sizes and in my limited tests two threads are considerably faster with sizes above 1000, and start lagging behind with sizes below 500.

Collectives™ on Stack Overflow

openmp: increasing of threads number decreases perfomance

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related