Threading vs Multiprocessing

Question

Suppose i have a table with 100000 rows and a python script which performs some operations on each row of this table sequentially. Now to speed up this process should I create 10 separate scripts and run them simultaneously that process subsequent 10000 rows of the table or should I create 10 threads to process rows for better execution speed ?

Have a look at docs.python.org/3/library/concurrent.futures.html. It sounds like you want map from either a ProcessPoolExecutor or ThreadPoolExecutor. Which one you want depends on the nature of the operation you're mapping (CPU bound vs IO bound respectively). — Chris
– Chris, Commented Feb 20, 2019 at 14:03

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

Threading

Due to the Global Interpreter Lock, python threads are not truly parallel. In other words only a single thread can be running at a time.
If you are performing CPU bound tasks then dividing the workload amongst threads will not speed up your computations. If anything it will slow them down because there are more threads for the interpreter to switch between.
Threading is much more useful for IO bound tasks. For example if you are communicating with a number of different clients/servers at the same time. In this case you can switch between threads while you are waiting for different clients/servers to respond

Multiprocessing

As Eman Hamed has pointed out, it can be difficult to share objects while multiprocessing.

Vectorization

Libraries like pandas allow you to use vectorized methods on tables. These are highly optimized operations written in C that execute very fast on an entire table or column. Depending on the structure of your table and the operations that you want to perform, you should consider taking advantage of this

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Feb 20, 2019 at 13:56

Arran Duff

1,5142 gold badges15 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Solomon Slow Over a year ago

FWIW: Python threads do run concurrently. That is to say, more than one thread can be "in progress" at the same time. Due to the GIL, they can't achieve true parallelism. I.e., two or more different CPUs simultaneously executing instructions on behalf of two or more different threads.

Arran Duff Over a year ago

@SolomonSlow I'm not sure I understand what you mean when you say that more than one thread can be in progress at the same time? The GIL is a lock that must be acquired for a thread to run. Since only one thread can acquire the lock at any one time - it is my understanding that only one thread can run at any one time....

Solomon Slow Over a year ago

By, "in progress," I meant that the thread has been started, but it has not yet finished. It may or it may not be running, but either way it has a context, which consists (at least) of a stack of unfinished function calls (including all of their arguments and local variables), and an instruction pointer that tells what statement the thread will execute next. The GIL allows only one Python thread to run at any given instant in time, but any number of threads can be started-but-not-yet-finished, and that's what computer scientists usually define "concurrently" to mean.

Solomon Slow Over a year ago

It's like how, you can tell your friends or neighbors that, "I'm remodeling my kitchen," when in fact, at that exact moment, you actually are working out at the gym. The remodel can still be "in progress" even though you actually are doing something else.

Arran Duff Over a year ago

@SolomonSlow thanks for the explanation. I hadn't realised the difference between concurrent and parallel. Good to know. I've corrected my answer

FranG · Accepted Answer · 2019-02-20 14:13:33Z

Process threads have in common a continouous(virtual) memory block known as heap processes don't. Threads also consume less OS resources relative to whole processes(seperate scripts) and there is no context switching happening.

The single biggest performance factor in multithreaded execution when there no locking/barriers involved is data access locality eg. matrix multiplication kernels.

Suppose data is stored in heap in a linear fashion ie. 0-th row in [0-4095]bytes, 1st row in[4096-8191]bytes, etc. Then thread-0 should operate in 0,10,20, ... rows, thread-1 operate in 1,11,21,... rows, etc.

The main idea is to have a set of 4K pages kept in physical RAM and 64byte blocks kept in L3 cache and operate on them repeatedly. Computers usually assume that if you 'use' a particular memory location then you're also gonna use adjacent ones, and you should do your best to do so in your program. The worst case scenario is accessing memory locations that are like ~10MiB apart in a random fashion so don't do that. Eg. If a single row is 1310720 doubles(64B) in size, then your threads should operate in a intra-row(single row) rather inter-row(above) fashion.

Benchmark your code and depending on your results, if your algorithm can process around 21.3GiB/s(DDR3-2666Mhz) of rows then you have a memory-bound task. If your code is like 1GiB/s processing speed, then you have a compute-bound task meaning executing instructions on data takes more time than fetching data from RAM and you need to either optimize your code or reach higher IPC by utilizing AVXx instructions sets or buy a newer processesor with more cores or higher frequency.

Collectives™ on Stack Overflow

Threading vs Multiprocessing

2 Answers 2

Threading

Multiprocessing

Vectorization

5 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Threading

Multiprocessing

Vectorization

5 Comments

Comments

Linked

Related