1

I am developing an MPI parallel program designed specifically to solve problem 2 on Project Euler. The original problem statement can be found here. My code works without any compilation errors, and the correct answer is retuned consistently (which can be verified on the website).

However, I thought it would be worthwhile to use MPI_Wtime() to gather data on how long it takes to execute the MPI program using 1, 2, 3, and 4 processes. To my surprise, I found that my program takes longer to execute as more processes are included. This is contrary to my expectations, as I thought increasing the number of processes would reduce the computation time (speed up) according to Amdahl’s law. I included my code for anyone who may be interested in testing this for themselves.

#include <mpi.h> #include <stdio.h> #include <tgmath.h> int main(int argc, char* argv[]) { MPI_Init(&argc, &argv); int rank, size, start_val, end_val, upperLimit; MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); upperLimit = 33; start_val = rank * (upperLimit / size) + 1; int num1 = 1; int num2 = 1; int SumNum = num1 + num2; int x = 0; double start, end; // begin timing start = MPI_Wtime(); // arbitrarily inflate the number of computations // to make the program take longer to compute // change to t < 1 for only 1 computation for (int i = 0; i < 10000000; i++) { // generate an algorithim that defines the range of // each process to handle for the fibb_sequence problem. if (rank == (size - 1)) { end_val = upperLimit; } else { end_val = start_val + (upperLimit / size) - 1; } /* calculations before this code indicate that it will take exactly 32 seperate algorithim computations to get to the largest number before exceeding 4,000,000 in the fibb sequence. This can be done with a simple computation, but this calculation will not be listed in this code. */ long double fibb_const = (1 + sqrt(5)) / 2; int j = start_val - 1; long double fibb_const1 = (1 - sqrt(5)) / 2; // calculate fibb sequence positions for the sequence using a formula double position11 = (pow(fibb_const, j) - pow(fibb_const1, j)) / (sqrt(5)); double position12 = (pow(fibb_const, j + 1) - pow(fibb_const1, (j + 1))) / (sqrt(5)); position11 = floor(position11); position12 = floor(position12); // dynamically assign values to each process to generate a solution quickly if (rank == 0) { for (int i = start_val; i < end_val; i++) { SumNum = num1 + num2; num1 = num2; num2 = SumNum; if (SumNum % 2 == 0) { x = x + SumNum; //printf("Process 0 reports %d \n \n", SumNum); //fflush(stdout); } } } else { for (int i = start_val; i < end_val; i++) { SumNum = position12 + position11; if (SumNum % 2 == 0) { x = x + SumNum; //printf("Process %d reports %d \n \n", rank, SumNum); //fflush(stdout); } position11 = position12; position12 = SumNum; } } int recieve_buf = 0; MPI_Reduce(&x, &recieve_buf, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); if (rank == 0) { //printf("This is the final solution: %d \n \n", recieve_buf); //fflush(stdout); } } // end timing end = MPI_Wtime(); // timer goes here double elapsed_time = end - start; printf("I am rank %d, and I report a walltime of %f seconds.", rank, elapsed_time); // end the MPI code MPI_Finalize(); return 0; } 

Note that I utilize 10000000 computations in a for loop to intentionally increase the computation time.

I have attempted to solve this problem by utilizing time.h and chrono in alternate versions of this code to cross-reference my results. Consistently, it seems as if the computation time increases as more processes are included. I saw a similar SO post here, but I could use an additional explination.

How I Run my Code

I use mpiexec -n <process_count> <my_file_name>.exe to run my code on from the VS Studio 2022 command prompt. Additionally, I have tested this code on macOS by running mpicc foo.c followed by mpiexec -n <process_count> ./a.out. All my best efforts seem to produce data contrary to my expectations.

Hopefully this question isn't too vague. I will provide more information if needed.

System Info

I am currently using a x64 based pc, Lenovo, Windows 11. Thanks again

4
  • I recently found some good information on several websites. I will work to implement those ideas and update my post as needed. Commented Jun 21, 2022 at 17:22
  • 1
    Isn't this a bit of a "sledgehammer to crack a nut"? The simple loop has only 31 iterations and completes in a fraction of a second, and PE allows one minute. The parallel processing setup time will probably vastly exceed its run time. Commented Jun 21, 2022 at 17:27
  • 1
    Neither parallel processing or threading are guaranteed by themselves to shorten run-time durations. There is much more to consider beside splitting workload amongst multiple microprocessors or processes. Commented Jun 21, 2022 at 18:21
  • Weather Vane-- this is indeed overkill. I half expected as much because the original project Euler problem statement can be solved with a single while loop. In this case, I wanted to see if computation time could be improved with the integration of MPI, but it may be better to implement MPI methods on a much larger problem. Really, I was looking for another way to practice parallel programming methods. Commented Jun 22, 2022 at 3:43

1 Answer 1

4

This is a case of the granularity being too fine. Granularity is defined as the amount of work between synchronization points vs the cost of synchronization. Let's say your MPI_Reduce takes one, or a couple of, microseconds. (A figure that has stayed fairly constant over the past few decades!) That's enough time to do a few thousand operations. So for speedup to occur, you need many thousands of operations between the reductions. You don't have that, so the time of your code is completely dominated by the cost of the MPI calls, and that does not go down with the number of processes.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.