Best way to parallelize this loop in OpenMP

Question

I have the following loop which i have attempted to parallelized using OpenMP but i am seeing no performance improvement, can any one please suggest how to improve it.

thread = omp_get_max_threads ( ) chunk=jmaxm/thread c$omp parallel shared (zetun,zetvn) private (i, j) c$omp do schedule(DYNAMIC,chunk) ORDERED do j=2,jmaxm jm=j-1 jp=j+1 do i=2,imaxm if (rmask(i,j).eq.1.0)then im=i-1 ip=i+1 zetun(i,j)= + (un(im,j,km)+un(ip,j,km)-2.*un(i,j,km))*recdx2 + + ((un(i,jp,km)-un(i,j,km))- + (un(i,j,km)-un(i,jm,km)))*recdy2 zetvn(i,j)= + ((vn(ip,j,km)-vn(i,j,km))- + (vn(i,j,km)-vn(im,j,km)))*recdx2 + + (vn(i,jp,km)+vn(i,jm,km)-2.*vn(i,j,km))*recdy2 endif end do end do c$omp end do nowait c$omp end parallel

I am now adding the modified code it as follows, but still does not seem to make any improvement MODIFIED CODE:

 c$omp parallel shared (zetun,zetvn) private (i,j,jm,jp,im,ip,km) c$omp do schedule(DYNAMIC,20) do j=2,jmaxm jm=j-1 jp=j+1 do i=2,imaxm if (rmask(i,j).eq.1.0)then im=i-1 ip=i+1 zetun(i,j)= + (un(im,j,km)+un(ip,j,km)-2.*un(i,j,km))*recdx2 + + ((un(i,jp,km)-un(i,j,km))- + (un(i,j,km)-un(i,jm,km)))*recdy2 zetvn(i,j)= + ((vn(ip,j,km)-vn(i,j,km))- + (vn(i,j,km)-vn(im,j,km)))*recdx2 + + (vn(i,jp,km)+vn(i,jm,km)-2.*vn(i,j,km))*recdy2 endif end do end do c$omp end do c$omp end parallel

Vladimir F Героям слава · Accepted Answer · 2013-12-22 09:29:27Z

2

The code is not valid. jm, jp, im and ip have to be private at least. Also, why you require ordered? It definitely slows it down. Also, hhy schedule dynamic with such a large chunk? Just use static.

Also, use some line indentation when coding, or at least when presenting your code to others.

answered Dec 22, 2013 at 9:29

Vladimir F Героям слава

60.7k4 gold badges82 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Jovi DSilva Over a year ago

I did as said but its taking even more time now, what about nowait is that needed?

Jovi DSilva Over a year ago

@HighPerformanceMark Yes the same code above, I did add jm,jp,im and ip as private but still it is slow also changed DYNAMIC,chunk to STATIC

Hristo Iliev Over a year ago

Since the innermost calculation is masked and therefore the computational load depends on the content of rmask, static might not be the optimal scheduling. But neither is dynamic with such chunk size.

Hristo Iliev Over a year ago

@JoviDsilva, when there is some computational imbalance, dynamic scheduling helps. But in your case you set the chunk size to be #iterations/#threads which means that each thread gets the same-sized iteration chunk. The same is true for static scheduling with default chunk size. Without any scheduling specification most OMP runtimes default to static (though not guaranteed by the standard). You could still use dynamic, but use smaller chunk size, otherwise it makes little to no sense. Choosing the correct chunk size is a bit tricky and might require experimenting.

Vladimir F Героям слава Over a year ago

Please, use some indentation for god's sake!

|

Collectives™ on Stack Overflow

Best way to parallelize this loop in OpenMP

1 Answer 1

11 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Related