151

Would it be possible to create a python Pool that is non-daemonic? I want a pool to be able to call a function that has another pool inside.

I want this because deamon processes cannot create process. Specifically, it will cause the error:

AssertionError: daemonic processes are not allowed to have children 

For example, consider the scenario where function_a has a pool which runs function_b which has a pool which runs function_c. This function chain will fail, because function_b is being run in a daemon process, and daemon processes cannot create processes.

4
  • AFAIK, no it's not possible all the worker in the pool are daemonized and it's not possible to inject the dependency , BTW i don't understand the second part of your question I want a pool to be able to call a function that has another pool inside and how that interfere with the fact that the workers are daemonized. Commented Aug 7, 2011 at 18:29
  • 9
    Because if function a has a pool which runs function b which has a pool which runs function c, there's a problem in b that it is being run in a daemon process, and daemon processes cannot create processes. AssertionError: daemonic processes are not allowed to have children Commented Aug 7, 2011 at 18:32
  • 2
    Instead of importing as from multiprocessing import Pool, use from concurrent.futures import ProcessPoolExecutor as Pool Commented Jan 18, 2021 at 23:49
  • Once again Python devs kneecapping our programs for no good reason. Thanks for the useless handcuffs. :( Commented Jun 20, 2024 at 16:51

11 Answers 11

156

The multiprocessing.pool.Pool class creates the worker processes in its __init__ method, makes them daemonic and starts them, and it is not possible to re-set their daemon attribute to False before they are started (and afterwards it's not allowed anymore). But you can create your own sub-class of multiprocesing.pool.Pool (multiprocessing.Pool is just a wrapper function) and substitute your own multiprocessing.Process sub-class, which is always non-daemonic, to be used for the worker processes.

Here's a full example of how to do this. The important parts are the two classes NoDaemonProcess and MyPool at the top and to call pool.close() and pool.join() on your MyPool instance at the end.

#!/usr/bin/env python # -*- coding: UTF-8 -*- import multiprocessing # We must import this explicitly, it is not imported by the top-level # multiprocessing module. import multiprocessing.pool import time from random import randint class NoDaemonProcess(multiprocessing.Process): # make 'daemon' attribute always return False def _get_daemon(self): return False def _set_daemon(self, value): pass daemon = property(_get_daemon, _set_daemon) # We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool # because the latter is only a wrapper function, not a proper class. class MyPool(multiprocessing.pool.Pool): Process = NoDaemonProcess def sleepawhile(t): print("Sleeping %i seconds..." % t) time.sleep(t) return t def work(num_procs): print("Creating %i (daemon) workers and jobs in child." % num_procs) pool = multiprocessing.Pool(num_procs) result = pool.map(sleepawhile, [randint(1, 5) for x in range(num_procs)]) # The following is not really needed, since the (daemon) workers of the # child's pool are killed when the child is terminated, but it's good # practice to cleanup after ourselves anyway. pool.close() pool.join() return result def test(): print("Creating 5 (non-daemon) workers and jobs in main process.") pool = MyPool(5) result = pool.map(work, [randint(1, 5) for x in range(5)]) pool.close() pool.join() print(result) if __name__ == '__main__': test() 
Sign up to request clarification or add additional context in comments.

19 Comments

I just tested my code again with Python 2.7/3.2 (after fixing the "print" lines) on Linux and Python 2.6/2.7/3.2 OS X. Linux and Python 2.7/3.2 on OS X works fine but the code does indeed hang with Python 2.6 on OS X (Lion). This seems to be a bug in the multiprocessing module, which got fixed, but I haven't actually checked the bug tracker.
This should really be fixed in the multiprocessing module (an option for non-daemonic workers should be available). Does anyone know who maintains it?
Nice work. If anyone is getting memory leak with this try using "with closing(MyPool(processes=num_cpu)) as pool:" to dispose of the pool properly
What's the disadvantages of using MyPool instead of the default Pool? In other words, in exchange for the flexibility of starting child processes, what costs do I pay? (If there were no costs, presumably the standard Pool would have used non-daemonic processes).
@machen Yes, unfortunately that's true. In Python 3.6 the Pool class has been extensively refactored, so Process isn't a simple attribute anymore, but a method, which returns the process instance it gets from a context. I tried overwriting this method to return a NoDaemonPool instance, but this results in the exception AssertionError: daemonic processes are not allowed to have children when the Pool is used.
|
61

I had the necessity to employ a non-daemonic pool in Python 3.7 and ended up adapting the code posted in the accepted answer. Below there's the snippet that creates the non-daemonic pool:

import multiprocessing.pool class NoDaemonProcess(multiprocessing.Process): @property def daemon(self): return False @daemon.setter def daemon(self, value): pass class NoDaemonContext(type(multiprocessing.get_context())): Process = NoDaemonProcess # We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool # because the latter is only a wrapper function, not a proper class. class NestablePool(multiprocessing.pool.Pool): def __init__(self, *args, **kwargs): kwargs['context'] = NoDaemonContext() super(NestablePool, self).__init__(*args, **kwargs) 

As the current implementation of multiprocessing has been extensively refactored to be based on contexts, we need to provide a NoDaemonContext class that has our NoDaemonProcess as attribute. NestablePool will then use that context instead of the default one.

That said, I should warn that there are at least two caveats to this approach:

  1. It still depends on implementation details of the multiprocessing package, and could therefore break at any time.
  2. There are valid reasons why multiprocessing made it so hard to use non-daemonic processes, many of which are explained here. The most compelling in my opinion is:

As for allowing children threads to spawn off children of its own using subprocess runs the risk of creating a little army of zombie 'grandchildren' if either the parent or child threads terminate before the subprocess completes and returns.

4 Comments

Regarding the caveat: My use case is parallelising tasks, but the grand-children return information to their parents that in turn return information to their parents after doing some required local processing. Consequently, every level / branch has an explicit wait for all its leafs. Does the caveat still apply if you explicitly have to wait for spawned processes to finish?
Would you bother adding how to use this instead of multiprocessing.pool?
"You can now use multiprocessing.Pool and NestablePool interchangeably".
Thanks! Exactly what I needed. :) The devs' rationale is completely irrational. "Hey you might do bad stuff with it". Guess what, I can still call os.fork () and forkbomb any system. They better remove that next. And os.remove, that sounds dangerous, definitely has to go. What about this open (outfile, "w") stuff? You could overwrite data, can't let that happen. Here's your over mitts and bubble wrap suit, have fun kids! :/
41

As of Python 3.8, concurrent.futures.ProcessPoolExecutor doesn't have this limitation. It can have a nested process pool with no problem at all:

from concurrent.futures import ProcessPoolExecutor as Pool from itertools import repeat from multiprocessing import current_process import time def pid(): return current_process().pid def _square(i): # Runs in inner_pool square = i ** 2 time.sleep(i / 10) print(f'{pid()=} {i=} {square=}') return square def _sum_squares(i, j): # Runs in outer_pool with Pool(max_workers=2) as inner_pool: squares = inner_pool.map(_square, (i, j)) sum_squares = sum(squares) time.sleep(sum_squares ** .5) print(f'{pid()=}, {i=}, {j=} {sum_squares=}') return sum_squares def main(): with Pool(max_workers=3) as outer_pool: for sum_squares in outer_pool.map(_sum_squares, range(5), repeat(3)): print(f'{pid()=} {sum_squares=}') if __name__ == "__main__": main() 

The above demonstration code was tested with Python 3.8.

A limitation of ProcessPoolExecutor, however, is that it doesn't have maxtasksperchild. If you need this, consider the answer by Massimiliano instead.

Credit: answer by jfs

9 Comments

This is now clearly the best solution, as it requires minimal changes.
works perfectly! ... as a side-note using a child- multiprocessing.Pool inside a ProcessPoolExecutor.Pool is also possible!
Unfortunately this doesn't work for me, still getting daemonic processes are not allowed to have children
@RoyShilkrot Which version of Python are you using exactly?
python 3.7. The problem was this was run from Celery, and I had to use import billiard as multiprocessing and use their Pool.
|
28

The multiprocessing module has a nice interface to use pools with processes or threads. Depending on your current use case, you might consider using multiprocessing.pool.ThreadPool for your outer Pool, which will result in threads (that allow to spawn processes from within) as opposed to processes.

It might be limited by the GIL, but in my particular case (I tested both), the startup time for the processes from the outer Pool as created here far outweighed the solution with ThreadPool.


It's really easy to swap Processes for Threads. Read more about how to use a ThreadPool solution here or here.

3 Comments

Thanks - this helped me a lot - great use of threading here (to spawn processes which actually perform well)
For people looking for a practical solution that probably applies to their situation, this is the one.
Users choosing a process pool are presumably CPU-bound and/or need cancellable tasks, so threads are not an option. This doesn't really answer the question.
10

On some Python versions replacing standard Pool to custom can raise error: AssertionError: group argument must be None for now.

Here I found a solution that can help:

class NoDaemonProcess(multiprocessing.Process): # make 'daemon' attribute always return False @property def daemon(self): return False @daemon.setter def daemon(self, val): pass class NoDaemonProcessPool(multiprocessing.pool.Pool): def Process(self, *args, **kwds): proc = super(NoDaemonProcessPool, self).Process(*args, **kwds) proc.__class__ = NoDaemonProcess return proc 

Comments

5

I have seen people dealing with this issue by using celery's fork of multiprocessing called billiard (multiprocessing pool extensions), which allows daemonic processes to spawn children. The walkaround is to simply replace the multiprocessing module by:

import billiard as multiprocessing 

Comments

4

The issue I encountered was in trying to import globals between modules, causing the ProcessPool() line to get evaluated multiple times.

globals.py

from processing import Manager, Lock from pathos.multiprocessing import ProcessPool from pathos.threading import ThreadPool class SingletonMeta(type): def __new__(cls, name, bases, dict): dict['__deepcopy__'] = dict['__copy__'] = lambda self, *args: self return super(SingletonMeta, cls).__new__(cls, name, bases, dict) def __init__(cls, name, bases, dict): super(SingletonMeta, cls).__init__(name, bases, dict) cls.instance = None def __call__(cls,*args,**kw): if cls.instance is None: cls.instance = super(SingletonMeta, cls).__call__(*args, **kw) return cls.instance def __deepcopy__(self, item): return item.__class__.instance class Globals(object): __metaclass__ = SingletonMeta """ This class is a workaround to the bug: AssertionError: daemonic processes are not allowed to have children The root cause is that importing this file from different modules causes this file to be reevalutated each time, thus ProcessPool() gets reexecuted inside that child thread, thus causing the daemonic processes bug """ def __init__(self): print "%s::__init__()" % (self.__class__.__name__) self.shared_manager = Manager() self.shared_process_pool = ProcessPool() self.shared_thread_pool = ThreadPool() self.shared_lock = Lock() # BUG: Windows: global name 'lock' is not defined | doesn't affect cygwin 

Then import safely from elsewhere in your code

from globals import Globals Globals().shared_manager Globals().shared_process_pool Globals().shared_thread_pool Globals().shared_lock 

I have written a more expanded wrapper class around pathos.multiprocessing here:

As a side note, if your usecase just requires async multiprocess map as a performance optimization, then joblib will manage all your process pools behind the scenes and allow this very simple syntax:

squares = Parallel(-1)( delayed(lambda num: num**2)(x) for x in range(100) ) 

Comments

4

Here is how you can start a pool, even if you are in a daemonic process already. This was tested in python 3.8.5

First, define the Undaemonize context manager, which temporarily deletes the daemon state of the current process.

class Undaemonize(object): '''Context Manager to resolve AssertionError: daemonic processes are not allowed to have children Tested in python 3.8.5''' def __init__(self): self.p = multiprocessing.process.current_process() if 'daemon' in self.p._config: self.daemon_status_set = True else: self.daemon_status_set = False self.daemon_status_value = self.p._config.get('daemon') def __enter__(self): if self.daemon_status_set: del self.p._config['daemon'] def __exit__(self, type, value, traceback): if self.daemon_status_set: self.p._config['daemon'] = self.daemon_status_value 

Now you can start a pool as follows, even from within a daemon process:

with Undaemonize(): pool = multiprocessing.Pool(1) pool.map(... # you can do something with the pool outside of the context manager 

While the other approaches here aim to create pool that is not daemonic in the first place, this approach allows you to start a pool even if you are in a daemonic process already.

2 Comments

current_process() has been moved to multiprocessing.current_process() in the latest python version - tested in Python 3.10.12.
This is the only solution that works when we are forced to spawn processes, inside of workers spawned using multiprocessing Pool (for eg, in Pytorch dataloader).
1

This presents a workaround for when the error is seemingly a false-positive. As also noted by James, this can happen to an unintentional import from a daemonic process.

For example, if you have the following simple code, WORKER_POOL can inadvertently be imported from a worker, leading to the error.

import multiprocessing WORKER_POOL = multiprocessing.Pool() 

A simple but reliable approach for a workaround is:

import multiprocessing import multiprocessing.pool class MyClass: @property def worker_pool(self) -> multiprocessing.pool.Pool: # Ref: https://stackoverflow.com/a/63984747/ try: return self._worker_pool # type: ignore except AttributeError: # pylint: disable=protected-access self.__class__._worker_pool = multiprocessing.Pool() # type: ignore return self.__class__._worker_pool # type: ignore # pylint: enable=protected-access 

In the above workaround, MyClass.worker_pool can be used without the error. If you think this approach can be improved upon, let me know.

Comments

1

Since Python version 3.7 we can create non-daemonic ProcessPoolExecutor

Using if __name__ == "__main__": is necessary while using multiprocessing.

from concurrent.futures import ProcessPoolExecutor as Pool num_pool = 10 def main_pool(num): print(num) strings_write = (f'{num}-{i}' for i in range(num)) with Pool(num) as subp: subp.map(sub_pool,strings_write) return None def sub_pool(x): print(f'{x}') return None if __name__ == "__main__": with Pool(num_pool) as p: p.map(main_pool,list(range(1,num_pool+1))) 

1 Comment

If I change your code and make it return value from both main_pool and sub_pool then it does not work. result = p.map(main_pool,list(range(1,num_pool+1))) instantly returns generator instead of final result
0

For me it was that answer that helped: https://stackoverflow.com/a/71929459/14715428

Although the question itself is a little bit different, the speed remained the same and I didn't have to rewrite anything

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.