1

I have the following loop which i have attempted to parallelized using OpenMP but i am seeing no performance improvement, can any one please suggest how to improve it.

thread = omp_get_max_threads ( ) chunk=jmaxm/thread c$omp parallel shared (zetun,zetvn) private (i, j) c$omp do schedule(DYNAMIC,chunk) ORDERED do j=2,jmaxm jm=j-1 jp=j+1 do i=2,imaxm if (rmask(i,j).eq.1.0)then im=i-1 ip=i+1 zetun(i,j)= + (un(im,j,km)+un(ip,j,km)-2.*un(i,j,km))*recdx2 + + ((un(i,jp,km)-un(i,j,km))- + (un(i,j,km)-un(i,jm,km)))*recdy2 zetvn(i,j)= + ((vn(ip,j,km)-vn(i,j,km))- + (vn(i,j,km)-vn(im,j,km)))*recdx2 + + (vn(i,jp,km)+vn(i,jm,km)-2.*vn(i,j,km))*recdy2 endif end do end do c$omp end do nowait c$omp end parallel 

I am now adding the modified code it as follows, but still does not seem to make any improvement MODIFIED CODE:

 c$omp parallel shared (zetun,zetvn) private (i,j,jm,jp,im,ip,km) c$omp do schedule(DYNAMIC,20) do j=2,jmaxm jm=j-1 jp=j+1 do i=2,imaxm if (rmask(i,j).eq.1.0)then im=i-1 ip=i+1 zetun(i,j)= + (un(im,j,km)+un(ip,j,km)-2.*un(i,j,km))*recdx2 + + ((un(i,jp,km)-un(i,j,km))- + (un(i,j,km)-un(i,jm,km)))*recdy2 zetvn(i,j)= + ((vn(ip,j,km)-vn(i,j,km))- + (vn(i,j,km)-vn(im,j,km)))*recdx2 + + (vn(i,jp,km)+vn(i,jm,km)-2.*vn(i,j,km))*recdy2 endif end do end do c$omp end do c$omp end parallel 

1 Answer 1

2

The code is not valid. jm, jp, im and ip have to be private at least. Also, why you require ordered? It definitely slows it down. Also, hhy schedule dynamic with such a large chunk? Just use static.

Also, use some line indentation when coding, or at least when presenting your code to others.

Sign up to request clarification or add additional context in comments.

11 Comments

I did as said but its taking even more time now, what about nowait is that needed?
@HighPerformanceMark Yes the same code above, I did add jm,jp,im and ip as private but still it is slow also changed DYNAMIC,chunk to STATIC
Since the innermost calculation is masked and therefore the computational load depends on the content of rmask, static might not be the optimal scheduling. But neither is dynamic with such chunk size.
@JoviDsilva, when there is some computational imbalance, dynamic scheduling helps. But in your case you set the chunk size to be #iterations/#threads which means that each thread gets the same-sized iteration chunk. The same is true for static scheduling with default chunk size. Without any scheduling specification most OMP runtimes default to static (though not guaranteed by the standard). You could still use dynamic, but use smaller chunk size, otherwise it makes little to no sense. Choosing the correct chunk size is a bit tricky and might require experimenting.
Please, use some indentation for god's sake!
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.