I had a strange problem. I have a file of the format:
START 1 2 STOP lllllllll START 3 5 6 STOP and I want to read the lines between START and STOP as blocks, and use my_f to process each block.
def block_generator(file): with open(file) as lines: for line in lines: if line == 'START': block=itertools.takewhile(lambda x:x!='STOP',lines) yield block and in my main function I tried to use map() to get the work done. It worked.
blocks=block_generator(file) map(my_f,blocks) will actually give me what I want. But when I tried the same thing with multiprocessing.Pool.map(), it gave me an error said takewhile() wanted to take 2 arguments, was given 0.
blocks=block_generator(file) p=multiprocessing.Pool(4) p.map(my_f,blocks) Is this a bug?
- The file have more than 1000000 blocks, each has less than 100 lines.
- I accept the answer form untubu.
- But maybe I will simple split the file and use n instance of my original script without multiprocessing to processing them then cat the results together. This way you can never be wrong as long as the script works on a small file.
block=list(itertools.takewhile(lambda x: x != 'STOP', lines))instead, so you don't have multiple iterators running at once?