Skip to main content
AI Assist is now on Stack Overflow. Start a chat to get instant answers from across the network. Sign up to save and share your chats.
added 70 characters in body
Source Link

This is a case of the granularity being too fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes one, or a couple of, microseconds. (A figure that has stayed fairly constant over the past few decades!) That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.

This is a case of the granularity being too fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes a couple of microseconds. That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.

This is a case of the granularity being too fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes one, or a couple of, microseconds. (A figure that has stayed fairly constant over the past few decades!) That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.

added 1 character in body
Source Link
ryyker
  • 23.2k
  • 3
  • 46
  • 96

This is a case of the granularity being totoo fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes a couple of microseconds. That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.

This is a case of the granularity being to fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes a couple of microseconds. That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.

This is a case of the granularity being too fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes a couple of microseconds. That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.

Source Link

This is a case of the granularity being to fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes a couple of microseconds. That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.