7

In my little understanding, it is the performance factor that drives programming for multi-threading in most cases but not all. (irrespective of Java or Python).

I was reading this enlightening article on GIL in SO. The article summarizes that python adopts GIL mechanism; i.e only a single Thread can execute python byte code at any given time. This makes single thread application really faster.

My question is as follows:

Since if only one Thread is served at a given point, does multiprocessing or thread module provides a way to overcome this limitation imposed by GIL? If not, what features does they provide for doing a real multi-task work

There was a question asked in the comments section of the above post in the accepted answer,but no answer has been made? I had this question in my mind too

^so at any time point of time, only one thread will be serving content to client... so no point of actually using multithreading to improve performance. right? 
1
  • 3
    Short answer: If your code is mostly waiting (for responses from the network for example), multithreading will work just fine to parallelize that waiting. If you're doing heavy computation and want to leverage all those cores, multiprocessing is what you need. Commented Jul 14, 2014 at 19:55

4 Answers 4

13

You're right about the GIL, there is no point to use multithreading to do CPU-bound computation, as the CPU will only be used by one thread.

But that previous statement may have enlighted you: If your computation is not CPU bound, you may take advantage of multithreading.

A typical example is when your application take most of its time waiting for something.

One of many many examples of not-CPU bound program: Say you want to build a web crawler, you have to crawl many many websites, and store them in a database, what does cost times ? Waiting for the servers to send data, actually downloading the data, and storing it in the database, nothing CPU bound here. Here you may get a faster crawler using a pool of crawlers instead of one single crawler. Typically in the case one website is almost down and very slow to respond (~30s), during this time, a single-threaded application will wait for the website, you're stuck. In a multithreaded application, other threads will continue crawling, and that's cool.

On the other hand, as there is one GIL per process, you may use multiprocessing to do CPU-bound computation.

As a side note, it exists some more or less partial implementations of Python without the GIL, I'd like to mention one that I think is in a great way to achieve something cool: pypy STM. You'll easily find, searching "get rid of the GIL" a lot of threads about the subject.

Sign up to request clarification or add additional context in comments.

6 Comments

does inter-process communication happen in multi-processing? how is the state of shared objects preserved or accessed in multiprocessing?
RTFM: docs.python.org/2/library/multiprocessing.html there is sections about exchanging objects, synchronisation, and sharing state.
"Waiting for the servers to send data, actually downloading the data, and storing it in the database, nothing CPU bound here" -- does that mean I can have a computer with just RAM and connect to internet and browse? sorry for the stupid question. but I would like to know the distinction here
Just to make it explicit: I/O-bound operations can actually run concurrency across threads because Python will release the GIL while any type of blocking I/O operation is running. The only exception to this rule would be if a poorly-written C-extension fails to release the GIL while it does blocking I/O. In that case, you'll be stuck in the thread running the I/O until the I/O completes.
@brainstorm The multiprocessing module is capable of sharing objects between processes, but there's a higher cost to do that than there would be with threads. The objects being sent between processes need to be pickled, sent to the other process via a socket, then unpickled on the other side. This is mostly invisible to you as a client of the library, but it is much slower than shared state between threads in a single process. There are some other options for sharing state (shared memory with ctypes, multiprocessing.Manager) but those have some drawbacks as well.
|
2

Multiprocessing side-steps the GIL issue because code runs in a separate process while the GIL is only concerned with a single process. Within a process, multithreading may be faster to the extent that threads are waiting for some relatively slow resource like the disk or network.

4 Comments

correct me: so with Multiprocessing, same bytecode can executed in a different processes. But there cannot be shared objects between these process I guess. (If so, where is the references held).
With multiprocessing you can achieve concurrency because there's one python interpreter for each process you spawn. You can share data between processes (for example, using Queues), see docs.python.org/2/library/….
You can share informations between processes: docs.python.org/2/library/…
multiprocessing is complex - you need to read the docs, but as others say, there are several ways to share data. mp works differently on linux and windows and only runs faster if processing time is significantly greater than the time to transfer data.
1

A quick google search yielded this informative slideshow. http://www.dabeaz.com/python/UnderstandingGIL.pdf

But what it fails to present it the fact that all threads are contained within a process. And a process by default can only run on one CPU (or core). So while the GIL on a per process basis does manage the threads in said process and doesn't always deliver the expected performance, it should at large scales perform better than single threaded operations.

Comments

1

GIL is always a hot topic in python but usually meaningless. It makes most programs much more safe. If you want real computational performance, try PyOpenCL. Any modern real-world high performance number crunching should be done on GPUs (also openCL runs happily on CPUs). It has no GIL issues.

If you want to do multithreading in python to improve I/O bound performance, GIL is not an issue there.

Lastly if you want to utilize multiple CPUs to increase performance of your pure number crunching, and in a pythonic fashion, use multiprocessing.

But its still not as fast as coding your multithreaded application in assembly. Good luck not making typos.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.