I wrote this program to properly learn how to use multi-threading. I want to implement something similar to this in my own program:
import numpy as np import time import os import math import random from threading import Thread def powExp(x, r): for c in range(x.shape[1]): x[r][c] = math.pow(100, x[r][c]) def main(): print() rows = 100 cols = 100 x = np.random.random((rows, cols)) y = x.copy() start = time.time() threads = [] for r in range(x.shape[0]): t = Thread(target = powExp, args = (x, r)) threads.append(t) t.start() for t in threads: t.join() end = time.time() print("Multithreaded calculation took {n} seconds!".format(n = end - start)) start = time.time() for r in range(y.shape[0]): for c in range(y.shape[1]): y[r][c] = math.pow(100, y[r][c]) end = time.time() print("Singlethreaded calculation took {n} seconds!".format(n = end - start)) print() randRow = random.randint(0, rows - 1) randCol = random.randint(0, cols - 1) print("Checking random indices in x and y:") print("x[{rR}][{rC}]: = {n}".format(rR = randRow, rC = randCol, n = x[randRow][randCol])) print("y[{rR}][{rC}]: = {n}".format(rR = randRow, rC = randCol, n = y[randRow][randCol])) print() for r in range(x.shape[0]): for c in range(x.shape[1]): if(x[r][c] != y[r][c]): print("ERROR NO WORK WAS DONE") print("x[{r}][{c}]: {n} == y[{r}][{c}]: {ny}".format( r = r, c = c, n = x[r][c], ny = y[r][c] )) quit() assert(np.array_equal(x, y)) if __name__ == main(): main() As you can see from the code the goal here is to parallelize the operation math.pow(100, x[r][c]) by creating a thread for every column. However this code is extremely slow, a lot slower than single-threaded versions.
Output:
Multithreaded calculation took 0.026447772979736328 seconds! Singlethreaded calculation took 0.006798267364501953 seconds! Checking random indices in x and y: x[58][58]: = 9.792315687115973 y[58][58]: = 9.792315687115973 I searched through stackoverflow and found some info about the GIL forcing python bytecode to be executed on a single core only. However I'm not sure that this is in fact what is limiting my parallelization. I tried rearranging the parallelized for-loop using pools instead of threads. Nothing seems to be working.
Python code performance decreases with threading
EDIT: This thread discusses the same issue. Is it completely impossible to increase performance using multi-threading in python because of the GIL? Is the GIL causing my slowdowns?
EDIT 2 (2017-01-18): So from what I can gather after searching for quite a bit online it seems like python is really bad for parallelism. What I'm trying to do is parellelize a python function used in a neural network implemented in tensorflow...it seems like adding a custom op is the way to go.