If I understand your question right, you would like to have simultaneous row-wise broadcasts:
P00 -> P01 & P02 P10 -> P11 & P12 P20 -> P21 & P22
This could be done using subcommunicators, e.g. one that only has processes from row 0 in it, another one that only has processes from row 1 in it and so on. Then you can issue simultaneous broadcasts in each subcommunicator by calling MPI_Bcast with the appropriate communicator argument.
Creating row-wise subcommunicators is extreamly easy if you use Cartesian communicator in first place. MPI provides the MPI_CART_SUB operation for that. It works like that:
// Create a 3x3 non-periodic Cartesian communicator from MPI_COMM_WORLD int dims[2] = { 3, 3 }; int periods[2] = { 0, 0 }; MPI_Comm comm_cart; // We do not want MPI to reorder our processes // That's why we set reorder = 0 MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, 0, &comm_cart); // Split the Cartesian communicator row-wise int remaindims[2] = { 0, 1 }; MPI_Comm comm_row; MPI_Cart_sub(comm_cart, remaindims, &comm_row);
Now comm_row will contain handle to a new subcommunicator that will only span the same row that the calling process is in. It only takes a single call to MPI_Bcast now to perform three simultaneous row-wise broadcasts:
MPI_Bcast(&data, data_count, MPI_DATATYPE, 0, comm_row);
This works because comm_row as returned by MPI_Cart_sub will be different in processes located at different rows. 0 here is the rank of the first process in comm_row subcommunicator which will correspond to P*0 because of the way the topology was constructed.
If you do not use Cartesian communicator but operate on MPI_COMM_WORLD instead, you can use MPI_COMM_SPLIT to split the world communicator into three row-wise subcommunicators. MPI_COMM_SPLIT takes a color that is used to group processes into new subcommunicators - processes with the same color end up in the same subcommunicator. In your case color should equal to the number of the row that the calling process is in. The splitting operation also takes a key that is used to order processes in the new subcommunicator. It should equal the number of the column that the calling process is in, e.g.:
// Compute grid coordinates based on the rank int proc_row = rank / 3; int proc_col = rank % 3; MPI_Comm comm_row; MPI_Comm_split(MPI_COMM_WORLD, proc_row, proc_col, &comm_row);
Once again comm_row will contain the handle of a subcommunicator that only spans the same row as the calling process.
MPI_Bcast(). Also post any communicator set-up, if that's what you're using for the groupings.