Python multithreading performance

Question

I've to process say thousands of records in an array. I did the normal for loop like this

for record in records: results = processFile(record) write_output_record(o, results)

The script above took 427.270612955 seconds!

As there is no dependancy between these records. I used Python multi threading module in a hope to speedup the process. below is my implementation

import multiprocessing from multiprocessing.dummy import Pool as ThreadPool pool = ThreadPool(processes=threads) results = pool.map(processFile, records) pool.close() pool.join() write_output(o, results)

My computer has 8 cpu's. And it takes 852.153398991 second. Can somebody help me as in what am I doing wrong?

PS: processFile function has no i/o's. its mostly processing the records and sending back the update record

can you paste the whole program please. if not please specify the data type for records (ie, list, queue, tuple, deque, etc) — dopstar
– dopstar, Commented Mar 20, 2017 at 6:29
It's difficult to say what's wrong with such little info on what your record and processFile look like. But I think that it is suspicious that the run time is now almost exactly double what it was. That hints at one single thing somewhere being doubled up and dominating the run time. — bazza
– bazza, Commented Mar 20, 2017 at 6:29
If your processFile is written in Python and "the juice" isn't done by some C function which releases the GIL, you are just spinning a lot of threads which get serialized by the GIL. In general, unless your code blocks on IO or on C code, adding threads in Python is virtually useless for this reason. — Matteo Italia
– Matteo Italia, Commented Mar 20, 2017 at 9:51

vishwarajanand · Accepted Answer · 2017-03-20 06:15:24Z

Try using vmstat and verify whether its a memory issue. Sometimes, using multithreading can slow your system down if each thread pushes up the RAM usage by a significant amount.

Usually people encounter three types of issues: CPU bound (Constraint on CPU computations), Memory bound (Constraint on RAM) and I/O bound (Network & hard drive I/O constraints).

Agreed!! It says 130% of cpu usage when I'm running with threads and It stays around 95% when I am not running with threads. So is it recommended in this case as not to use threading?
Then your code's performance is not being judged correctly by the machine. You can try to run the same code on a machine with possibly more RAM.

Collectives™ on Stack Overflow

Python multithreading performance

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related