EDIT:
The slowdown is not due to HDD performance, because:
the files are getting cached to RAM anyway
the same operations with multiprocessing (not multithreading) are indeed getting faster (almost by factor of CPUs number)
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about CollectivesStack Internal
Knowledge at work
Bring the best of human thought and AI automation together at your work.
Explore Stack InternalEDIT:
The slowdown is not due to HDD performance, because:
the files are getting cached to RAM anyway
the same operations with multiprocessing (not multithreading) are indeed getting faster (almost by factor of CPUs number)
EDIT:
The slowdown is not due to HDD performance, because:
the files are getting cached to RAM anyway
the same operations with multiprocessing (not multithreading) are indeed getting faster (almost by factor of CPUs number)
I have a python script, which works on the following scheme: read a large file (e.g., movie) - compose selected information from it into a number of small temporary files - spawn in subprocesses a C++ application to perform the files processing/calculations (separately for each file) - read the application output. To speed up the script I used multiprocessing. However, it has major drawback: each process has to maintain in RAM the whole copy of the large input file, and therefore I can run only few processes, as I run out of memory. Thus I decided to try multithreading instead (or some combination of multiprocessing and multithreading) due to the fact that threads share the address space. As the python part most of the time works with file I/O or waits for the C++ application to complete, I thought that GIL must not be an issue here. Nevertheless, instead of some gain in performance I observe drastic slowdown, mainly owing to the I/O part.
I illustrate the problem with the following code (saved as test.py):
import sys, threading, tempfile, time nthreads = int(sys.argv[1]) class IOThread (threading.Thread): def __init__(self, thread_id, obj): threading.Thread.__init__(self) self.thread_id = thread_id self.obj = obj def run(self): run_io(self.thread_id, self.obj) def gen_object(nlines): obj = [] for i in range(nlines): obj.append(str(i) + '\n') return obj def run_io(thread_id, obj): ntasks = 100 // nthreads + (1 if thread_id < 100 % nthreads else 0) for i in range(ntasks): tmpfile = tempfile.NamedTemporaryFile('w+') with open(tmpfile.name, 'w') as ofile: for elem in obj: ofile.write(elem) with open(tmpfile.name, 'r') as ifile: content = ifile.readlines() tmpfile.close() obj = gen_object(100000) starttime = time.time() threads = [] for thread_id in range(nthreads): threads.append(IOThread(thread_id, obj)) threads[thread_id].start() for thread in threads: thread.join() runtime = time.time() - starttime print('Runtime: {:.2f} s'.format(runtime)) When I run it with different number of threads, I get this:
$ python3 test.py 1 Runtime: 2.84 s $ python3 test.py 1 Runtime: 2.77 s $ python3 test.py 1 Runtime: 3.34 s $ python3 test.py 2 Runtime: 6.54 s $ python3 test.py 2 Runtime: 6.76 s $ python3 test.py 2 Runtime: 6.33 s Can someone explain me the result, as well as give some advice, how to effectively parallelize I/O using multithreading?