I'm trying to implement the distance matrix in parallel using openmp in which I calculate the distance between each point and all the other points, so the best algorithm I thought of till now cost O(n^2) and the performance of my algorithm using openmp using 10 thread on 8processor machine isn't better than the serial approach in terms of running time, so I wonder if there is any mistake in my implementation on the openmp approach as this is my first time to use openmp, so please if there is any mistake in my apporach or any better "faster" approach please let me know. The following is my code where "dat" is a vector that contains the data points.
map <int, map< int, double> > dist; //construct the distance matrix int c=count(dat.at(0).begin(),dat.at(0).end(),delm)+1; #pragma omp parallel for shared (c,dist) for(int p=0;p<dat.size();p++) { for(int j=p+1;j<dat.size();j++) { double ecl=0; string line1=dat.at(p); string line2=dat.at(j); for (int i=0;i<c;i++) { double num1=atof(line1.substr(0,line1.find_first_of(delm)).c_str()); line1=line1.substr(line1.find_first_of(delm)+1).c_str(); double num2=atof(line2.substr(0,line2.find_first_of(delm)).c_str()); line2=line2.substr(line2.find_first_of(delm)+1).c_str(); ecl += (num1-num2)*(num1-num2); } ecl=sqrt(ecl); #pragma omp critical { dist[p][j]=ecl; dist[j][p]=ecl; } } }
#pragma omp criticalinside a doubly-nested loop and you're wondering where the bottleneck is? o.O