Python and multithreading

Question

The python incref is define like this

#define Py_INCREF(op) ( \ _Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA \ ((PyObject *)(op))->ob_refcnt++)

With multi-core, the incrementation is only is L1 cache and not flushed to memory.

If two thread increment the refcnt at the same time, in differents core, without a flush to the real memory, for me, it's possible to lost one incrementation. - ob_refcnt=1 - Core 1 increment, but not flush => ob_refcnt=2 in L1 cache of core 1 - Core 2 increment, but not flush => ob_refcnt=2 in L1 cache of core 2 - WTF

Is it a risk to use multi-core or multi-process ?

The PyObject was declared like this:

typedef struct _object { _PyObject_HEAD_EXTRA Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject

But Py_ssize_t is just a ssize_t or intptr_t.

The _Py_atomic* functions and attributes do not seem to be used.

How Python can manage this scenario ? How can it flush the cache between threads ?

How Python can manage this scenario ? Python can't manage any thread ! Because all threads used seperated python shell, you can only start,stop,pause actions. Data addresses are shared between processes (not copied, moved, only the shadow data image is rendered(Snapshot)). — dsgdfg
– dsgdfg, Commented Feb 14, 2018 at 15:20

DavidW · Accepted Answer · 2018-02-14 18:15:02Z

The CPython implementation of Python has the global interpreter lock (GIL). It is undefined behaviour to call the vast majority of Python C API functions (including Py_INCREF) without holding this lock and will almost certainly result in inconsistent data or your program crashing.

The GIL can be released and acquired as described in the documentation.

Because of the need to hold this lock in order to operate on Python objects multithreading in Python is pretty limited, and the only operations that parallelize well are things like waiting for IO or pure C calculations on large arrays. The multiprocessing module (that starts isolated Python processes) is another option for parallel Python.

There have been attempts to use atomic types for reference counting (to remove/minimize the need for the GIL) but these caused significant slowdowns in single-threaded code so were abandoned.

I the code to manage GIL, I suppose the mfence was called to flush all the incrementation. Is it correct ?
I'm not 100% sure. Internally it does seem to use some _PyAtomic calls but I don't know if they're the right ones personally...

KaO · Accepted Answer · 2018-02-21 09:05:59Z

Why not use Lock's or Semaphore's of Python ? https://docs.python.org/2/library/threading.html

Collectives™ on Stack Overflow

Python and multithreading

2 Answers 2

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Linked

Related