2

I have a large number of data files needed to be processed through a function A. Let say 1000 files, each process for each file takes less than 15 min with 6GB memory. My computer has 32GB and 8 cpus, so I can use maximum 4 processes (24GB mem and 4 cpus) a time for safety. So my question is that can I use multiprocess package in python to create 4 processes and each process continuously get function A to process a data file independently like the figure below. It is clearly that each cpu has to process approx. 250 files, but the file sizes of 1000 files are diferent then it is not necessarily true. One note that once a process is finished, then it assigned a new job immediately no matter what the other processes are finished or not, i.e there is no wait time for all four processes finished at the same time. The return of function A is not important here. Please provide the codes! Thank you for any suggestion.

enter image description here

2 Answers 2

3

I think the best solution is to use multiprocessing.Pool. It makes it really easy to set up a pool of processes (as many as you specify), then provide them with jobs to do in parallel. Here's some basic example code:

import multiprocessing as mp def handle_file(filename): # do your processing here def process_files(list_of_files): pool = mp.Pool(4) # argument is number of processes, default is the number of CPUs pool.map(list_of_files) # this returns a list of results, but you can ignore it 

This code will be a little slower than necessary, since it passes the results from the function calls back to the parent process (even if the return values are all None), but I suspect the overhead will be relatively small if your processing tasks take any significant amount of time.

Sign up to request clarification or add additional context in comments.

Comments

0

1000 files 15 mins each will be more than 10 days on one machine. I'd distribute the work using something like Dispy. That would give you monitoring etc. for free.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.