This is a case of the granularity being too fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes one, or a couple of, microseconds. (A figure that has stayed fairly constant over the past few decades!) That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.
This is a case of the granularity being too fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes a couple of microseconds. That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.
This is a case of the granularity being too fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes one, or a couple of, microseconds. (A figure that has stayed fairly constant over the past few decades!) That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.
This is a case of the granularity being totoo fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes a couple of microseconds. That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.
This is a case of the granularity being to fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes a couple of microseconds. That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.
This is a case of the granularity being too fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes a couple of microseconds. That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.
This is a case of the granularity being to fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes a couple of microseconds. That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.