1

If a paralleled function is called by a parallel for loop, does the performance still improve?

Here is a simple example:

int main(void) { ... #pragma omp parallel for for (int i = 0; i < 100; i++) { res[i] = parallel_function(); } ... } int parallel_function() { ... #pragma omp parallel for for (int j = 0; j < 1000; j++) { // do something } ... } 

In this simple case, I found that the performance of paralleling for-loop in main wasn't improved. But in another complicated case, the performance is improved.
So I'm not sure if paralleling outer for-loop which calls a paralleled function can improve the performance.
If the performance is indeed improved, why? I thought that all available threads have been assigned to for-loop in main, then each thread have to execute parallel_function serially. Is it correct?
Thanks a lot!

7
  • 2
    First do you have nested parallelism enabled ? Have a look at stackoverflow.com/questions/65119234/… for a better understanding Commented Aug 13, 2021 at 9:35
  • @dreamcrash No I don't. So I guess that the parallel_function will not be parallel executed, right? Commented Aug 13, 2021 at 9:43
  • 1
    Yes, exactly, it would be executed sequentially Commented Aug 13, 2021 at 9:44
  • 1
    Well, generally it is worth to parallelize the outer loop and not the inner one, but the performance depends on a lot of things (workload, load balance, cache utilization, etc.), so try different setups (and settings). You can also think about combinations (e.g. in a NUMA system 8 threads for outer loop, 2 closer threads for inner loop). So, without more details we can only give you such general advice. If you would like to improve the performance of your program use a profiler tool.... Commented Aug 13, 2021 at 13:49
  • 2
    Nesting parallel loops is generally a pretty bad idea due to thread over-subscribtion (pointed out by @dreamcrash). You can use the taskloop directive (or other task-based directives) to eventually solve this problem as well as possible worker starvation possibly due to the small number of iteration (typically on many core systems). Note that tasks still include a small overhead. One should prefer collapsed paralleled for when it is possible due to the smaller overhead. When the number of iteration is big enough in the first loop, it is often better not to parallelize the inner loop. Commented Aug 13, 2021 at 18:41

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.