4

I have a python script, which works on the following scheme: read a large file (e.g., movie) - compose selected information from it into a number of small temporary files - spawn in subprocesses a C++ application to perform the files processing/calculations (separately for each file) - read the application output. To speed up the script I used multiprocessing. However, it has major drawback: each process has to maintain in RAM the whole copy of the large input file, and therefore I can run only few processes, as I run out of memory. Thus I decided to try multithreading instead (or some combination of multiprocessing and multithreading) due to the fact that threads share the address space. As the python part most of the time works with file I/O or waits for the C++ application to complete, I thought that GIL must not be an issue here. Nevertheless, instead of some gain in performance I observe drastic slowdown, mainly owing to the I/O part.

I illustrate the problem with the following code (saved as test.py):

import sys, threading, tempfile, time nthreads = int(sys.argv[1]) class IOThread (threading.Thread): def __init__(self, thread_id, obj): threading.Thread.__init__(self) self.thread_id = thread_id self.obj = obj def run(self): run_io(self.thread_id, self.obj) def gen_object(nlines): obj = [] for i in range(nlines): obj.append(str(i) + '\n') return obj def run_io(thread_id, obj): ntasks = 100 // nthreads + (1 if thread_id < 100 % nthreads else 0) for i in range(ntasks): tmpfile = tempfile.NamedTemporaryFile('w+') with open(tmpfile.name, 'w') as ofile: for elem in obj: ofile.write(elem) with open(tmpfile.name, 'r') as ifile: content = ifile.readlines() tmpfile.close() obj = gen_object(100000) starttime = time.time() threads = [] for thread_id in range(nthreads): threads.append(IOThread(thread_id, obj)) threads[thread_id].start() for thread in threads: thread.join() runtime = time.time() - starttime print('Runtime: {:.2f} s'.format(runtime)) 

When I run it with different number of threads, I get this:

$ python3 test.py 1 Runtime: 2.84 s $ python3 test.py 1 Runtime: 2.77 s $ python3 test.py 1 Runtime: 3.34 s $ python3 test.py 2 Runtime: 6.54 s $ python3 test.py 2 Runtime: 6.76 s $ python3 test.py 2 Runtime: 6.33 s 

Can someone explain me the result, as well as give some advice, how to effectively parallelize I/O using multithreading?

EDIT:

The slowdown is not due to HDD performance, because:

1) the files are getting cached to RAM anyway

2) the same operations with multiprocessing (not multithreading) are indeed getting faster (almost by factor of CPUs number)

2
  • Sidenote - I always like to use multiprocessing and multiprocessing.dummy to easily test multiprocessing vs. multithreading problems. It offers a simple API and hassle-free switching between processes & threads. Commented Oct 16, 2015 at 13:15
  • Have you considered using memory-mapping? That would map the file into memory, but shared between processes. The OS would then perform the actual IO when necessary. It would also free up the RAM when it's not used any more, i.e. it doesn't swap. Commented Oct 16, 2015 at 18:13

2 Answers 2

3

As I delved deeper into the problem, I made comparison benchmarks for 4 different parallelisation methods, 3 of which are using python and 1 is using java (the purpose of the test was not to compare I/O machinery between different languages but to see if multithreading can boost I/O operations). The test was performed on Ubuntu 14.04.3, all files were placed to a RAM disk.

Although the data are quite noisy, the clear trend is evident (see the chart; n=5 for each bar, error bars represent SD): python multithreading fails to boost the I/O performance. The most probable reason is GIL, and therefore there is no way around it.

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

-1

I think your performance measures don't lie: you're asking your hard disk to do many things at the same time. Reads, writes, fsync when closing the files, ... and on several files at the same time. It triggers a lot of hardware physical operations. And the more files you write at the same time, the more contention you get.

So the CPU is waiting for the disk operation to finish...

Moreover, maybe you don't have a SSD hard disk, so the syncs actually mean some physical moves.

EDIT: it could be a GIL problem. When you iterate elem in obj in run_io, you execute python code between each write. The ofile.write probably release the GIL, so that the IO doesnt block the other threads, but the lock is released/acquired with each iteration. So maybe your writes don't really run "concurrently".

EDIT2: to test the hypothesis you can try to replace:

for elem in obj: ofile.write(elem) 

with:

ofile.write("".join(obj)) 

and see if perf gets better

2 Comments

Even in worst case scenario with GIL I would expect comparable runtime, not a 2x slowdown
It does not get any better after the replacement

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.