If a paralleled function is called by a parallel for loop, does the performance still improve?
Here is a simple example:
int main(void) { ... #pragma omp parallel for for (int i = 0; i < 100; i++) { res[i] = parallel_function(); } ... } int parallel_function() { ... #pragma omp parallel for for (int j = 0; j < 1000; j++) { // do something } ... } In this simple case, I found that the performance of paralleling for-loop in main wasn't improved. But in another complicated case, the performance is improved.
So I'm not sure if paralleling outer for-loop which calls a paralleled function can improve the performance.
If the performance is indeed improved, why? I thought that all available threads have been assigned to for-loop in main, then each thread have to execute parallel_function serially. Is it correct?
Thanks a lot!
parallel_functionwill not be parallel executed, right?taskloopdirective (or othertask-based directives) to eventually solve this problem as well as possible worker starvation possibly due to the small number of iteration (typically on many core systems). Note that tasks still include a small overhead. One should prefer collapsed paralleled for when it is possible due to the smaller overhead. When the number of iteration is big enough in the first loop, it is often better not to parallelize the inner loop.