Spawn a process in Python without forking

Question

I'm working with Python (2.7) and pymongo (3.3), and I need to spawn a child process to run a job asynchronously. Unfortunately pymongo is not fork-safe as described here (and I need to interact with the db before spawning the child process).

I ran an experiment using subprocess.Popen (with shell set to True and then False) and multiprocessing.Process. As far as I can tell they both fork the parent process to create the child process, but only multiprocessing.Process causes pymongo to print its warning that it has detected a forked process.

I'm wondering what the pythonic way of doing this is. It seems that perhaps os.system will do it for me but subprocess is described as an intended replacement for os.system so I wonder whether I'm missing something.

Serge Ballesta · Accepted Answer · 2017-02-28 17:37:51Z

Not fork safe does not mean that you cannot call fork... It just mean that the child process should not use any inherited PyMongo instance. When you use subprocess.Popen, the newly forked child almost immediately calls exec to be replaced by a shell instance (shell = True) or the required executable (shell = False). So it is safe from a PyMongo point of view.

At the opposite when you call multiprocessing.Process, the child is indeed a copy of the parent and does keep its PyMongo instances. So it is unsafe to use PyMongo in that context, and the warning message was correctly issued

A. Jesse Jiryu Davis · Accepted Answer · 2017-03-01 01:34:57Z

I think you misunderstand; since PyMongo's documentation warns you that a single MongoClient is not fork-safe, you interpret that to mean that PyMongo prohibits your whole program from ever creating subprocesses.

Any single MongoClient is not fork-safe, meaning you must not create it before forking and use the same MongoClient object after forking. Using PyMongo in your program overall, or using one MongoClient before a fork and a different one after, are all safe.

That's why subprocess.Popen is ok: you fork, then exec (to replace your program with a different one in the child process), and therefore you cannot possibly use the same MongoClient in the child afterward.

To quote the PyMongo FAQ:

On Unix systems the multiprocessing module spawns processes using fork(). Care must be taken when using instances of MongoClient with fork(). Specifically, instances of MongoClient must not be copied from a parent process to a child process. Instead, the parent process and each child process must create their own instances of MongoClient. For example:

# Each process creates its own instance of MongoClient. def func(): db = pymongo.MongoClient().mydb # Do something with db. proc = multiprocessing.Process(target=func) proc.start()

Never do this:

client = pymongo.MongoClient() # Each child process attempts to copy a global MongoClient # created in the parent process. Never do this. def func(): db = client.mydb # Do something with db. proc = multiprocessing.Process(target=func) proc.start()

Instances of MongoClient copied from the parent process have a high probability of deadlock in the child process due to inherent incompatibilities between fork(), threads, and locks. PyMongo will attempt to issue a warning if there is a chance of this deadlock occurring.

Aha makes sense. I was also curious about any potential side-effects from inheriting file descriptors (particularly socket handles) in the child process, but I guess Popen's close_fds argument addresses that too
PyMongo creates its sockets with FD_CLOEXEC, so those descriptors are closed whether you pass close_fds or not.

ShadowRanger · Accepted Answer · 2017-02-28 17:26:10Z

If you're able to move to Python 3.4 or higher, you could, prior to using pymongo, set your multiprocessing start method to 'forkserver'. That forks a fork server process immediately, and all future use of multiprocessing forks that fork server, not your main process. So once the fork server is set up, your main process can use pymongo, the fork server won't have used it, so it won't have issues reforking.

Sadly, start methods were only added in 3.4, so it's not an option for 2.7, but if someone else has this issue, it may be useful to them.

Yes that's exactly the sort of thing I was hoping for. I see now that pymongo will work after fork/exec, but forking some fresh unrelated process feels like a cleaner way to spawn child processes to me (maybe my windows roots are showing). Unfortunately python 2->3 feels like it's years away for our codebase :(

Collectives™ on Stack Overflow

Spawn a process in Python without forking

3 Answers 3

Comments

2 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

1 Comment

Related