0

I have an object with two attributes: a dict and an int. When I modify the object using a forked process via multiprocessing.Pool, I get back the object with it the modified int attribute, but the dict isn't modified. Why is that?

from multiprocessing import Pool def fork(): someObject = SomeClass() for i in range(10): someObject.method(i) print("in fork, someObject has dct=%s and nbr=%i" % (someObject.dct, someObject.nbr)) return someObject def test(): pool = Pool(processes=1) result = pool.apply(func=fork) print("in main, someObject has dct=%s and nbr=%i" % (result.dct, result.nbr)) class SomeClass(object): dct = {} nbr = 0 def method(self, nbr): self.dct[nbr]=nbr self.nbr+=nbr if __name__=='__main__': test() 

Output:

in fork, someObject has dct={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} and nbr=45

in main, someObject has dct={} and nbr=45

2 Answers 2

2

The parent process has a different copy of SomeClass.dct and SomeClass.nbt than the child process(es).

The reason nbr is updated but not dct is that nbr actually becomes an instance variable when you do self.nbr+=nbr, which gets pickled and sent back to the parent process. But you never assign self.dct to anything, so self.dct (which actually refers to SomeClass.dct) does not get pickled.

You can see this by defining a __getstate__() on SomeClass:

class SomeClass(object): dct = {} nbr = 0 def method(self, nbr): self.dct[nbr]=nbr self.nbr+=nbr def __getstate__(self): res = self.__dict__ print("pickled", res) return res 

This prints:

in fork, someObject has dct={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} and nbr=45 ('pickled', {'nbr': 45}) in main, someObject has dct={} and nbr=45 

You can force dct to be pickled by assigning it to "itself":

class SomeClass(object): dct = {} nbr = 0 def method(self, nbr): self.dct[nbr]=nbr self.dct = self.dct self.nbr+=nbr 

This prints:

in fork, someObject has dct={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} and nbr=45 ('pickled', {'nbr': 45, 'dct': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}}) in main, someObject has dct={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} and nbr=45 
Sign up to request clarification or add additional context in comments.

2 Comments

You, sir, are the greatest. I spent several hours trying to figure this out. Thanks. I would like to understand why self.dct[nbr]=nbr doesn't "assign anything" though.
@DerekFarren What self.nbr += nbr does is: 1. read self.nbr, 2. add nbr to it, 3. assign the result back into self.nbr. In contrast, what self.dct[nbr] = nbr does is: 1. read self.dct (which is a reference to a dict object), 2. mutate the state of the dict by setting the key nbr to nbr. At no point do you actually assign the dict back to self.dct; you only mutate the state of it.
1

I found an alternative solution. Instead of using a dict(), I used a multiprocessing.Manager.dict() and it worked as expected.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.