13

I currently have this piece of code (feel free to comment on it too :) )

def threaded_convert_to_png(self): paths = self.get_pages() pool = Pool() result = pool.map(convert_to_png, paths) self.image_path = result 

On an Intel i7 it spawns eight workers when running on Linux; however, when running Windows 8.1 Pro it only spawns one worker. I checked and cpu_count() returns 8 on both Linux and Windows.

  • Is there something I am missing here, or doing wrong?
  • Is there a way to fix that problem?

P.S. This is in Python 2.7.6

12
  • Pool.__init__ calls cpu_count to get the default number of processes (see Lib/multiprocessing/pool.py at line 146). Also the __init__ calls _repopulate_pool on line 159 which executes a loop on line 213 that spawns the correct number of Process instances. Are you sure only one worker is spawn? How are you checking the number of workers? Commented Feb 21, 2014 at 16:31
  • I'm sure because I only see one extra python process (and the conversion takes ages). I even tried passing Pool(processes=8), and again only one worker got spawned. Commented Feb 21, 2014 at 16:33
  • 2
    Try to create a minimal complete code example that shows your issue e.g., use def f(path): print path, mp.current_process() instead of convert_to_png() and enable logging mp.log_to_stderr().setLevel(logging.DEBUG). Commented Feb 24, 2014 at 13:16
  • 1
    what is len(paths)? Commented Feb 24, 2014 at 13:17
  • 1
    Have you properly enclosed you script in if __name__ == '__main__': and is convert_to_png properly defined outside of it? (documented here: docs.python.org/2/library/multiprocessing.html) Commented Feb 24, 2014 at 14:12

2 Answers 2

1
+100

There is one easy way to determine what is happends in your pool - to turn on multiprocessing debug. You can do it like this:

import logging from multiprocessing import util util.log_to_stderr(level=logging.DEBUG) 

And on script running you will get full info about processes running, spawning and exiting.

But any way, process pool always spawn N processes (where is N - "processes" argument value or cpu_count), but tasks distribution between processes can be uneven - it depends on task run time.

Sign up to request clarification or add additional context in comments.

Comments

1

I managed to solve my similar problem. I'm not sure if it's help for you but I decided to document it here anyway in case it helps someone.

In my case I was analyzing huge amount of tweets (52000 in total) by dividing them to multiple processors. It worked fine on OSX and on server, but on my Windows 8.1 it was really slow and processes were activated sequentially. By looking into task-manager I noticed that the main Python process' memory usage went up and up to around 1.5Gb. The worker process' memory usage climbed similarly. Now I noticed that my older version worked fine which had slightly different algorithm. In the end the problem was that I retrieved whole tweets from database while I required only the text part of the tweets. This apparently led to grown memory usage. After I fixed that part, the program launched worker processes properly.

So based on my experience I have a hunch that Windows tries to control the ram usage by blocking the worker processes. If so, check the ram usage of your processes. This is just speculation on my part, so I'm interested if someone has better explanation.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.