Skip to content

Commit b07bfb3

Browse files
committed
Initial commit
0 parents commit b07bfb3

File tree

14 files changed

+880
-0
lines changed

14 files changed

+880
-0
lines changed

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
bin/seq
2+
bin/omp
3+
bin/thread
4+
bin/thread2
5+
bin/mpi
6+
data/*txt

Makefile

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
CC=gcc
2+
CFLAGS= -Wall -std=gnu99 -g -fopenmp
3+
LIBS=src/matrix.c
4+
TUNE= -O2
5+
6+
all: sequential omp thread2 mpi
7+
8+
sequential:
9+
$(CC) $(TUNE) $(CFLAGS) -o bin/seq $(LIBS) src/sequential.c
10+
11+
omp:
12+
$(CC) $(TUNE) $(CFLAGS) -o bin/omp $(LIBS) src/omp.c
13+
14+
thread:
15+
$(CC) $(TUNE) $(CFLAGS) -pthread -o bin/thread $(LIBS) src/thread.c
16+
17+
thread2:
18+
$(CC) $(TUNE) $(CFLAGS) -pthread -o bin/thread2 $(LIBS) src/thread2.c
19+
20+
mpi:
21+
mpicc $(TUNE) $(CFLAGS) -o bin/mpi $(LIBS) src/mpi.c

README.md

Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# Hochleistungsrechnen Assignment 1
2+
3+
##### Parallel Matrix Multiplication Using OpenMP, Phtreads, and MPI
4+
5+
----------
6+
7+
## Assignment
8+
The multiplication of two matrices is to be implemented as
9+
10+
* a sequential program
11+
* an OpenMP shared memory program
12+
* an explicitly threaded program (using the pthread standard)
13+
* a message passing program using the MPI standard
14+
15+
## Matrix multiplication
16+
The aim is to multiply two matrices together.To multiply two matrices, the number of columns of the first matrix has to match the number of lines of the second matrix. The calculation of the matrix solution has independent steps, it is possible to parallelize the calculation.
17+
18+
## Project Tree
19+
20+
.
21+
|-- bin
22+
| |-- mpi
23+
| |-- omp
24+
| |-- seq
25+
| `-- thread2
26+
|-- data
27+
| |-- mat_4_5.txt
28+
| `-- mat_5_4.txt
29+
|-- src
30+
| |-- matrix.c
31+
| |-- matrix.h
32+
| |-- mpi.c
33+
| |-- omp.c
34+
| |-- sequential.c
35+
| |-- thread2.c
36+
| `-- thread.c
37+
|-- Makefile
38+
|-- random_float_matrix.py
39+
|-- README.md
40+
|-- README.pdf
41+
`-- Test-Script.sh
42+
43+
The `README.*` contains this document as a Markdown and a PDF file.
44+
The python script `random_float_matrix.py` generates `n x m` float matrices (This script is inspired by Philip Böhm's solution).
45+
`./Test-Script.sh` is a script that generates test matrices with the python script, compiles the C-programs with `make` and executes the diffrent binaries with the test-matrices. The output of the script are the execution times of the particular implementations.
46+
47+
## Makefile
48+
CC=gcc
49+
CFLAGS= -Wall -std=gnu99 -g -fopenmp
50+
LIBS=src/matrix.c
51+
TUNE= -O2
52+
53+
all: sequential omp thread2 mpi
54+
55+
sequential:
56+
$(CC) $(TUNE) $(CFLAGS) -o bin/seq $(LIBS) src/sequential.c
57+
58+
omp:
59+
$(CC) $(TUNE) $(CFLAGS) -o bin/omp $(LIBS) src/omp.c
60+
61+
thread:
62+
$(CC) $(TUNE) $(CFLAGS) -pthread -o bin/thread $(LIBS) src/thread.c
63+
64+
thread2:
65+
$(CC) $(TUNE) $(CFLAGS) -pthread -o bin/thread2 $(LIBS) src/thread2.c
66+
67+
mpi:
68+
mpicc $(TUNE) $(CFLAGS) -o bin/mpi $(LIBS) src/mpi.c
69+
70+
`make` translates all implementations. The binary files are then in the `bin/` directory.
71+
The implementation `thread2.c` is the final solution of the *thread* subtask. `thread.c` was my first runnable solution but it is not fast(every row has one thread). I decided to keep it anyway, for a comparable set.
72+
For the compiler optimization I have chosen "-02", the execution time was best here.
73+
74+
## Example
75+
Every implementation needs 2 matrix files as program argument to calculate the result matrix to `stdout` (`bin/seq mat_file_1.txt mat_file_2.txt`).
76+
The `rows` are seperated by newlines(`\n`) and the columns are seperated by tabular(`\t`). The reason is the pretty output on the shell. All implementations calculate with floating-point numbers.
77+
78+
[mp432@localhost]% cat data/mat_4_5.txt
79+
97.4549968447 4158.04953246 2105.6723138 9544.07472156 2541.05960201
80+
1833.23353473 9216.3834844 8440.75797842 1689.62403742 4686.03507194
81+
5001.05053096 7289.39586628 522.921369146 7057.57603906 7637.9829023
82+
737.191477364 4515.30312019 1370.71005027 9603.48073923 7543.51110732
83+
84+
[mp432@localhost]% cat data/mat_5_4.txt
85+
8573.64127861 7452.4636398 9932.62634628 1261.340226
86+
7527.08499112 3872.81522875 2815.39747607 5735.65492468
87+
7965.24212592 7428.31976294 290.255638815 5940.92582147
88+
6175.98390217 5995.21703679 6778.73998746 9060.90690747
89+
2006.95378498 6098.70324661 619.384482373 1396.62426963
90+
91+
[mp432@localhost]% bin/seq data/mat_4_5.txt data/mat_5_4.txt
92+
112949567.256707 105187212.450287 79556423.335490 126508582.287008
93+
172162416.208937 150764506.000392 60962563.539173 127174399.969315
94+
160826865.507086 158278548.934611 122920214.859773 125839554.344572
95+
125675943.680898 136743486.943968 90204309.448167 132523052.230353
96+
97+
## Implementations
98+
99+
### Sequential
100+
101+
The sequential program is used to compare and correctness to the other implementations. The following is an excerpt from the source code. Here is computed the result matrix.
102+
103+
for (int i = 0; i < result_matrix->rows; i++) {
104+
for (int j = 0; j < result_matrix->cols; j++) {
105+
for (int k = 0; k < m_1->cols; k++) {
106+
result_matrix->mat_data[i][j] += m_1->mat_data[i][k] *
107+
m_2->mat_data[k][j];
108+
}
109+
}
110+
}
111+
112+
### Thread (POSIX Threads)
113+
The `sysconf(_SC_NPROCESSORS_ONLN)` from `#include <unistd.h>` returns the number of processors, what is set as the thread number, to use the full capacity. The following excerpt shows the thread memory allocation.
114+
115+
int number_of_proc = sysconf(_SC_NPROCESSORS_ONLN);
116+
...
117+
// Allocate thread handles
118+
pthread_t *threads;
119+
threads = (pthread_t *) malloc(number_of_proc * sizeof(pthread_t));
120+
121+
### Open Multi-Processing (OpenMP)
122+
The standard shared-memory model is the fork/join model.
123+
The OpenMP implementation is just the sequential program with the omp pragma `#pragma omp parallel for` over the first for-loop. This pragma can only be used in the outer loop. Only there are independent calculations.
124+
The performance increased about 40 percent compared to the sequential implementation.
125+
126+
#pragma omp parallel for
127+
for (int i = 0; i < result_matrix->rows; i++) {
128+
for (int j = 0; j < result_matrix->cols; j++) {
129+
for (int k = 0; k < m_1->cols; k++) {
130+
result_matrix->mat_data[i][j] += m_1->mat_data[i][k] *
131+
m_2->mat_data[k][j];
132+
}
133+
}
134+
}
135+
136+
### Message Passing Interface (MPI)
137+
A difficulty it was the spread of the data to the worker.
138+
At first, the matrix dimensions will be broadcast via `MPI_Bcast(&matrix_properties, 4, MPI_INT, 0, MPI_COMM_WORLD);` to the workers.
139+
140+
The size of the matrices is fixed. Now the 2-Dim matrix is converted into a 1-Dim matrix. So it is easier and safer to distribute the matrix data.
141+
142+
This function gets a matrix struct and returns an 1-Dim data array.
143+
144+
double *mat_2D_to_1D(matrix_struct *m) {
145+
double *matrix = malloc( (m->rows * m->cols) * sizeof(double) );
146+
for (int i = 0; i < m->rows; i++) {
147+
memcpy( matrix + (i * m->cols), m->mat_data[i], m->cols * sizeof(double) );
148+
}
149+
return matrix;
150+
}
151+
152+
The second step is to broadcast the matrix data to the workers. Each worker computes its own "matrix area" with the mpi `rank`. Disadvantage of this implementation is that first all the data are distributed.
153+
The third step is to collect the data via
154+
155+
MPI_Gather(result_matrix, number_of_rows,
156+
MPI_DOUBLE, final_matrix,
157+
number_of_rows, MPI_DOUBLE,
158+
0, MPI_COMM_WORLD);`
159+
160+
At the end, the master presents the result matrix.
161+
162+
> To compile and run the mpi implementation, it is necessary that `mpicc` and `mpirun` are in the search path. (e.g. `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/openmpi/lib/ `)
163+
164+
165+
## Performance Test
166+
The `sirius cluster` was not available during task processing (specifically for the MPI program). Therefore, all performance tests were run on `atlas`.
167+
168+
[mp432@atlas Parallel-Matrix-Multiplication-master]$ ./Test-Script.sh
169+
generate test-matrices with python if no test data found
170+
171+
generate 5x4 matrix...
172+
generate 100x100 matrix...
173+
generate 1000x1000 matrix...
174+
generate 5000x5000 matrix...
175+
compile...
176+
177+
gcc -O2 -Wall -std=gnu99 -g -fopenmp -o bin/seq src/matrix.c src/sequential.c
178+
gcc -O2 -Wall -std=gnu99 -g -fopenmp -o bin/omp src/matrix.c src/omp.c
179+
gcc -O2 -Wall -std=gnu99 -g -fopenmp -pthread -o bin/thread2 src/matrix.c src/thread2.c
180+
mpicc -O2 -Wall -std=gnu99 -g -fopenmp -o bin/mpi src/matrix.c src/mpi.c
181+
182+
calculate...
183+
184+
* * * * * * * 100x100 Matrix
185+
with sequential 0m0.032s
186+
with omp 0m0.034s
187+
with thread2 0m0.032s
188+
with mpi(4p) 0m1.242s
189+
190+
* * * * * * * 1000x1000 Matrix
191+
with sequential 0m11.791s
192+
with omp 0m4.182s
193+
with thread2 0m4.153s
194+
with mpi(4p) 0m12.682s
195+
196+
* * * * * * * 5000x5000 Matrix
197+
with sequential 26m52.342s
198+
with omp 4m57.186s
199+
with thread2 5m5.767s
200+
with mpi(4p) 5m2.174s
201+
202+
The output times are the `real times` from the unix `time` command.
203+
You can see the advantages of parallel computation in the last matrix calculation. The parallel calculation is about 5 times faster (for large matrices).

Test-Script.sh

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
#!/bin/bash
2+
echo "generate test-matrices with python if no test data found"
3+
echo
4+
if [ ! -f data/mat_5_4.txt ]; then
5+
echo "generate 5x4 matrix..."
6+
python random_float_matrix.py 5 4 > data/mat_5_4.txt
7+
fi
8+
9+
if [ ! -f data/mat_4_5.txt ]; then
10+
python random_float_matrix.py 4 5 > data/mat_4_5.txt
11+
fi
12+
13+
if [ ! -f data/mat_100x100.txt ]; then
14+
echo "generate 100x100 matrix..."
15+
python random_float_matrix.py 100 100 > data/mat_100x100.txt
16+
fi
17+
18+
if [ ! -f data/mat_100x100b.txt ]; then
19+
python random_float_matrix.py 100 100 > data/mat_100x100b.txt
20+
fi
21+
22+
if [ ! -f data/mat_1000x1000.txt ]; then
23+
echo "generate 1000x1000 matrix..."
24+
python random_float_matrix.py 1000 1000 > data/mat_1000x1000.txt
25+
fi
26+
if [ ! -f data/mat_1000x1000b.txt ]; then
27+
python random_float_matrix.py 1000 1000 > data/mat_1000x1000b.txt
28+
fi
29+
30+
31+
if [ ! -f data/mat_5000x5000a.txt ]; then
32+
echo "generate 5000x5000 matrix..."
33+
python random_float_matrix.py 5000 5000 > data/mat_5000x5000a.txt
34+
fi
35+
if [ ! -f data/mat_5000x5000b.txt ]; then
36+
python random_float_matrix.py 5000 5000 > data/mat_5000x5000b.txt
37+
fi
38+
39+
echo "compile..."
40+
echo
41+
make
42+
echo
43+
echo "calculate..."
44+
echo
45+
echo "* * * * * * * 100x100 Matrix"
46+
cal_t=$((time bin/seq data/mat_100x100.txt data/mat_100x100b.txt) 2>&1 > /dev/null | grep real | awk '{print $2}')
47+
echo "with sequential $cal_t"
48+
49+
cal_t=$((time bin/omp data/mat_100x100.txt data/mat_100x100b.txt) 2>&1 > /dev/null | grep real | awk '{print $2}')
50+
echo "with omp $cal_t"
51+
52+
cal_t=$((time bin/thread2 data/mat_100x100.txt data/mat_100x100b.txt) 2>&1 > /dev/null | grep real | awk '{print $2}')
53+
echo "with thread2 $cal_t"
54+
55+
cal_t=$((time mpirun -np 4 bin/mpi data/mat_100x100.txt data/mat_100x100b.txt) 2>&1 > /dev/null | grep real | awk '{print $2}')
56+
echo "with mpi(4p) $cal_t"
57+
echo
58+
59+
echo "* * * * * * * 1000x1000 Matrix"
60+
cal_t=$((time bin/seq data/mat_1000x1000.txt data/mat_1000x1000b.txt) 2>&1 > /dev/null | grep real | awk '{print $2}')
61+
echo "with sequential $cal_t"
62+
63+
cal_t=$((time bin/omp data/mat_1000x1000.txt data/mat_1000x1000b.txt) 2>&1 > /dev/null | grep real | awk '{print $2}')
64+
echo "with omp $cal_t"
65+
66+
cal_t=$((time bin/thread2 data/mat_1000x1000.txt data/mat_1000x1000b.txt) 2>&1 > /dev/null | grep real | awk '{print $2}')
67+
echo "with thread2 $cal_t"
68+
69+
cal_t=$((time mpirun -np 4 bin/mpi data/mat_1000x1000.txt data/mat_1000x1000b.txt) 2>&1 > /dev/null | grep real | awk '{print $2}')
70+
echo "with mpi(4p) $cal_t"
71+
echo
72+
73+
echo "* * * * * * * 5000x5000 Matrix"
74+
cal_t=$((time bin/seq data/mat_5000x5000a.txt data/mat_5000x5000b.txt) 2>&1 > /dev/null | grep real | awk '{print $2}')
75+
echo "with sequential $cal_t"
76+
77+
cal_t=$((time bin/omp data/mat_5000x5000a.txt data/mat_5000x5000b.txt) 2>&1 > /dev/null | grep real | awk '{print $2}')
78+
echo "with omp $cal_t"
79+
80+
cal_t=$((time bin/thread2 data/mat_5000x5000a.txt data/mat_5000x5000b.txt) 2>&1 > /dev/null | grep real | awk '{print $2}')
81+
echo "with thread2 $cal_t"
82+
83+
cal_t=$((time mpirun -np 4 bin/mpi ddata/mat_5000x5000a.txt data/mat_5000x5000b.txt) 2>&1 > /dev/null | grep real | awk '{print $2}')
84+
echo "with mpi(4p) $cal_t"
85+
echo
86+

bin/.gitkeep

Whitespace-only changes.

data/.gitkeep

Whitespace-only changes.

random_float_matrix.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#!/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
4+
# this small helper script generates test matrices of the given dimension and
5+
# outputs them to stdout so that you can pipe it into files. The elements are
6+
# separated by '\t'.
7+
#
8+
# Synopsis:
9+
# ./gen_matrices.py 4 4
10+
#
11+
# Author: Philipp Böhm
12+
13+
import sys
14+
import random
15+
16+
try:
17+
dim_x, dim_y = int(sys.argv[1]), int(sys.argv[2])
18+
except Exception, e:
19+
sys.stderr.write("error parsing matrix dimensions ...\n")
20+
raise
21+
22+
23+
for row in xrange(0, dim_x):
24+
print "\t".join([ unicode(random.uniform(0, 9999)) for x in xrange(0, dim_y) ])

0 commit comments

Comments
 (0)