0

If I have this code,

subroutine min_distance(r,n,k,centroid,distance,indices,distancereg) integer, intent(out):: n,k real,dimension(:,:),intent(in),allocatable::centroid real,dimension(:,:),intent(in),allocatable::r integer,dimension(:),intent(out),allocatable::indices,distancereg real ::d_min integer::y,i_min,j,i integer,parameter :: data_dim=2 allocate (indices(n)) allocate (distancereg(k)) !cost=0.d0 do j=1,n i_min = -1 d_min=1.d6 do i=1,k distance=0.d0 distancereg(i)=0.d0 do y=1,data_dim distance = distance+abs(r(y,j)-centroid(y,i)) distancereg(i)=distancereg(i)+abs(r(y,j)-centroid(y,i)) end do if (distance<d_min) then d_min=distance i_min=i end if end do if( i_min < 0 ) print*," found error by assigning k-index to particle ",j indices(j)=i_min end do 

What I want to do is, when I calculate distance for each k, I want to paralelize it. ie. Assign each thread to do it. For example if k=3, then for k=1 the distance calculated by thread 1, and so on. I have tried with omp_nested, omp_ordered, but still showing some error. will appreciate if there is any advice / guidance .

Thanks

2
  • Once again, it's quite difficult to understand what you are trying to achieve. There are 3 loops, which one(s) do you want to parallelize? Do you need to parallelize more than one loop? If yes, why? Commented Nov 15, 2022 at 14:34
  • Thanks, but I am a beginner in parallelizing. I just want to make each thread to calculate distance, and distancereg (i). If I want to do that, my assumption is enough to parallelize only i over k, so the second loop. Commented Nov 15, 2022 at 15:02

2 Answers 2

1

If you want to parallelize a loop (or loop nest) you have to wonder first which iterations are independent. In your case, each outer j iteration computes an i_min value that is 1. initialized in each i iteration, and 2. written into location (j). So each i_min calculation is independent and you can make the j loop parallel. (You also have d_min but that is never used.)

If the j loop is long enough that should be enough to get high performance. You might be tempted to look at the next loop over i. It computes a separate distance value for each iteration, so that's again parallel. Except that you update i_min,d_min, so you need to declare that loop a reduction.

However, the two loops are not "perfectly nested", so you can not spread the total i,j iteration space over the threads.

TLDR: your outer j loop can be parallelized.

Sign up to request clarification or add additional context in comments.

2 Comments

So, is it better to parallelize J loop ? For example, J will be looped 6000 time (which is 6000 datapoints). And parallized by 4 thread (i.e 1500 each). Then is it possible for each thread to do all the task (as you can see, this loop is to compute distance and also assigning the index ). So, what I mean for J=1, 1500 , indexing and distance is computed by 1st thread, . And so on. Sorry if my question is unclear hope you understand it
anyway, thanks for your explanation. I have better understanding now
0

What simply about:

 do j=1,n distancereg(:)=0.d0 !$OMP PARALLEL DO PRIVATE(y) do i=1,k do y=1,data_dim distancereg(i)=distancereg(i)+abs(r(y,j)-centroid(y,i)) end do end do !$OMP PARALLEL END DO indices(j)=minloc(distancereg,dim=1) end do 

Since you are storing the distances for each i, the search for the minimum value can be postponed after the loop on i

Or parallelizing the outer loop (here you don't need to store the distances):

 !$OMP PARALLEL DO PRIVATE(i,y,i_min,d_min,distance) do j=1,n i_min = -1 d_min=1.d6 do i=1,k distance=0.d0 do y=1,data_dim distance = distance+abs(r(y,j)-centroid(y,i)) end do if (distance<d_min) then d_min=distance i_min=i end if end do if( i_min < 0 ) print*," found error by assigning k-index to particle ",j indices(j)=i_min end do !$OMP END PARALLEL DO 

2 Comments

How its work if I parallel the J loop ? For example, J will be looped 6000 time (which is 6000 datapoints). And parallized by 4 thread (i.e 1500 each). Then is it possible for each thread to do all the task (as you can see, this loop is to compute distance and also assigning the index ). So, what I mean for J=1, 1500 , indexing and distance is computed by 1st thread, . And so on. Sorry if my question is unclear hope you understand it
By default the compiler assigns iterations to the different threads with a deterministic pattern. And although this is not specified, I think that most of time it assigns contiguous chunks of iterations (i.e. in your case the thread 0 handles the iterations 1..1500, the thread 1 the iterations 1501..3000, and so on). You can modify the way the iterations are assigned to the threads by using the schedule clause.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.