I'm running a script on multiple csv files using multiprocessing.
If a line matches the regex, it writes the line to (a) new file(s) (new file name equals match).
I've noticed a problem writing to the same file(s) from different processes (file lock). How can i fix this ?
My code:
import re import glob import os import multiprocessing pattern ='abc|def|ghi|jkl|mno' regex = re.compile(pattern, re.IGNORECASE) def process_files (file): res_path = r'd:\results' with open(file, 'r+', buffering=1) as ifile: for line in ifile: matches = set(regex.findall(line)) for match in matches: res_file = os.path.join(res_path, match + '.csv') with open(res_file, 'a') as rf: rf.write(line) def main(): p = multiprocessing.Pool() for file in glob.iglob(r'D:\csv_files\**\*.csv', recursive=True): p.apply_async(process, [file]) p.close() p.join() if __name__ == '__main__': main() Thanks in advance!
process_files, it is just a string with the path (as yielded byglob.iglob)