2

If in the presence of GIL, all subtasks performed in a multithreading program occur sequentially, then it is equivalent to single threaded program. Why were multithreading packages (multiprocessing.dummy.Pool, multiprocessing.pool.ThreadPool, concurrent.futures.ThreadPoolExecutor, etc) in Python written? Why should someone use them? (This is not about Multiprocessing)

5
  • 1
    Multiprocessing uses processes, so it is much different than an actual ThreadPool. Actual threads in python can switch context, so although only one concurrent thread will be running the tasks can appear in parallel. CPU bound tasks will not see an improvement from normal threading.Thread, but io bound tasks might. Commented Jun 18, 2019 at 13:58
  • @matt Can you give a simple example, where io bound task might give some improvement? Commented Jun 18, 2019 at 14:08
  • 1
    One common use case is to write the "heavy lifting" computational code as a set of C functions, and then have a Python script call out to those functions to tie it all together. Since a Python thread doesn't (usually) need to keep the GIL locked while the C code is running, that allows multiple cores to compute simultaneously via the multiple Python threads. Commented Jun 18, 2019 at 14:24
  • 1
    Requesting a webpage. It takes some time to establish the connection, get a response and start downloading. Some of these waiting times, the thread will switch be blocking, and the gil can switch to another available thread. This should be faster than if the application was completely serial because everything would have to wait on the blocking task. Commented Jun 18, 2019 at 14:35
  • Multithreading is a model of concurrency, of which parallel processing is just one special case. Threads have been used to represent concurrent activities within a process (e.g., different client threads within a server process) since at least a couple of decades before multi-CPU systems came on the market. (Don't ask me how I know!) Commented Jun 18, 2019 at 15:11

2 Answers 2

1

Two things:

  1. GIL is released from time to time. For example when you try to execute some lengthy operation (read from file, write to file, send over network and so on). This allows for some interleaved operations - read in one thread, process in another.

  2. Multithreading is not only performance, there're other benefits - for example easiness of expression. Imagine you've two algorithms running in "parallel", which communicate with each other and exchange data. You put each algorithm on it's own thread, synchronize (to make sure thread switch will happen when it can) and off you go. Without multithreading you'd have to rely on event programming, which is difficult and easily scalable. The good example for this are various kinds of session based servers, for example ftp server. It's much easier to write multithreaded multiuser ftp server, than single threaded.

In general yes, if you use threads for performace then python's threads don't make sense. But python is not used for performance, but for how easy it's to write (and modify) code. And threads do help for parallel code a lot, even if they don't offer any multhreading performance benefits.

Sign up to request clarification or add additional context in comments.

2 Comments

So you are saying, one of the advantages is refactoring code. Your point#1 is also about how you can refactor two tasks (def read(), def process())
Yes. This is the main reason. But if you look for multithreaded code and removes code, that is multithreaded because of performance, you will see the same. As far as i know threading in python isn't all that often used.
1

multiprocessing.Pool does actually run code in a parallel fashion.

As the documentation states

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine.

Give it a try and check the docs for more info!

from multiprocessing import Pool import time def f(n): print('start sleeping') time.sleep(4) print('slept enough') return n*2 s = time.time() with Pool(4) as p: r = p.map(f, [1,2,3,4]) print(f'multiprocessing version runtime: {round(time.time()-s,2)} seconds') print(r) s = time.time() r=[] for n in [1,2,3,4]: r.append(f(n)) print(f'loop version runtime: {round(time.time()-s,2)} seconds') print(r) 

The output:

start sleeping start sleeping start sleeping start sleeping slept enough slept enough slept enough slept enough multiprocessing version runtime: 4.03 seconds [2, 4, 6, 8] start sleeping slept enough start sleeping slept enough start sleeping slept enough start sleeping slept enough loop version runtime: 16.01 seconds [2, 4, 6, 8] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.