I'm trying to write a code that runs in parallel hardware using mpi and openmp. I have the following code piece:
#pragma omp parallel for private(k, temp_r) for(j=0; j<size; j++){ temp_r = b[j]; for(k=0; k<rows; k++){ temp_r = temp_r - A[j*rows + k] * x[k]; } r[j] = temp_r; } I know this code could be further improved because the internal for loop is a reduction. I can do the reduction for one for loop. But I'm not sure how to go about this since there are two for loops involved here. Any insight would be helpful.
size/rows. On what system are you executing the code? Eventually, you will have to provide a minimal reproducible example to get a good answer.