I have a simple script in which a function does some calculation on a Pandas.Series object which I want to parallel process. I have made the Pandas.Series object as a shared memory object so that different processess can use it.
My code is given below.
from multiprocessing import shared_memory import pandas as pd import numpy as np import multiprocessing s = pd.Series(np.random.randn(50)) s = s.to_numpy() # Create a shared memory variable shm which can be accessed by other processes shm_s = shared_memory.SharedMemory(create=True, size=s.nbytes) b = np.ndarray(s.shape, dtype=s.dtype, buffer=shm_s.buf) b[:] = s[:] # create a dictionary to store the results and which can be accessed after the processes works mgr = multiprocessing.Manager() pred_sales_all = mgr.dict() forecast_period =1000 # my sudo function to run parallel process def predict_model(b,model_list_str,forecast_period,pred_sales_all): c = pd.Series(b) temp_add = model_list_str + forecast_period temp_series = c.add(model_list_str) pred_sales_all[str(temp_add)] = temp_series # parallel processing with shared memory if __name__ == '__main__': model_list = [1, 2, 3, 4] all_process = [] for model_list_str in model_list: # setup a process to run process = multiprocessing.Process(target=predict_model, args=(b,model_list_str, forecast_period, pred_sales_all)) # start the process we need to join() them separately else they will finish execution before moving to next process process.start() # Append all process together all_process.append(process) # Finish execution of all process for p in all_process: p.join() This code is working in ubuntu I checked. But when I run this in windows I am getting the following error.
RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. Also I tried the solution mentioned here in the stack-overflow question
What is wrong with the code and can someone solve the issue ? Is my parallelization code wrong ?
if __name__ == '__main__':when multiprcessing.if __name__=='__main__':only.mgr = multiprocessing.Manager()creates a process so if you are running under Windows, that statement needs to be moved to inside theif __name__ == '__main__':block. Also,buffer = swill clobber the previous assignment tobuffer. You have other statements at global scope that should also be moved. I don't really think you are sharing anything.s.to_numpy()shm_s = shared_memory.SharedMemory(create=True, size=s.nbytes)b=np.ndarray(s.shape,dtype=s.dtype,buffer=shm_s.buf)b[:]=a[:]. Then passingbin the args of the pocess solve my shared memory issue ?band passing b. Is it sharable now ? I am getting the output though.