I have the following loop which i have attempted to parallelized using OpenMP but i am seeing no performance improvement, can any one please suggest how to improve it.
thread = omp_get_max_threads ( ) chunk=jmaxm/thread c$omp parallel shared (zetun,zetvn) private (i, j) c$omp do schedule(DYNAMIC,chunk) ORDERED do j=2,jmaxm jm=j-1 jp=j+1 do i=2,imaxm if (rmask(i,j).eq.1.0)then im=i-1 ip=i+1 zetun(i,j)= + (un(im,j,km)+un(ip,j,km)-2.*un(i,j,km))*recdx2 + + ((un(i,jp,km)-un(i,j,km))- + (un(i,j,km)-un(i,jm,km)))*recdy2 zetvn(i,j)= + ((vn(ip,j,km)-vn(i,j,km))- + (vn(i,j,km)-vn(im,j,km)))*recdx2 + + (vn(i,jp,km)+vn(i,jm,km)-2.*vn(i,j,km))*recdy2 endif end do end do c$omp end do nowait c$omp end parallel I am now adding the modified code it as follows, but still does not seem to make any improvement MODIFIED CODE:
c$omp parallel shared (zetun,zetvn) private (i,j,jm,jp,im,ip,km) c$omp do schedule(DYNAMIC,20) do j=2,jmaxm jm=j-1 jp=j+1 do i=2,imaxm if (rmask(i,j).eq.1.0)then im=i-1 ip=i+1 zetun(i,j)= + (un(im,j,km)+un(ip,j,km)-2.*un(i,j,km))*recdx2 + + ((un(i,jp,km)-un(i,j,km))- + (un(i,j,km)-un(i,jm,km)))*recdy2 zetvn(i,j)= + ((vn(ip,j,km)-vn(i,j,km))- + (vn(i,j,km)-vn(im,j,km)))*recdx2 + + (vn(i,jp,km)+vn(i,jm,km)-2.*vn(i,j,km))*recdy2 endif end do end do c$omp end do c$omp end parallel