0

am having issues with my code when trying to run multiprocessing tasks using multiprocessing python library.

Here is my code: I have a function called extract_tag_data

def extract_tag_data(tag): search_bar.send_keys(tag) search_bar.send_keys(Keys.RETURN) for i in range (2): articles=driver.find_elements(By.XPATH, "//table[@class='table table-hover']/tbody/tr/td[2]/div[@class='media']/div[@class='media-body']/strong/a") for article in articles[:1]: article.click() dict['tag']=tag dict['article_title'].append(unidecode.unidecode(driver.find_element(By.XPATH,'//h1[@class="title"]').text)) dict['abstract'].append(unidecode.unidecode(driver.find_element(By.XPATH,'//div[@class="abstract"]/div[1]').text)) dict['authors'].append(unidecode.unidecode(",".join([element.text for element in (driver.find_elements(By.XPATH,'//div[@class="authors"]/span'))]))) dict['structs'].append(unidecode.unidecode(",".join([element.text for element in (driver.find_elements(By.XPATH,'//div[@class="authors"]/div[@class="structs"]/div[@class="struct"]/a'))]))) driver.back() driver.find_element(By.XPATH,'//table[@class="table table-hover"]/tfoot/tr[1]/th[2]/ul/li/a/span[@class="glyphicon glyphicon-step-forward"]').click() 

and I want to run this task on tags list in parallel:

if __name__ == '__main__': with multiprocessing.get_context('spawn').Pool(3) as pool: pool.map(extract_tag_data, (tags)) pool.close() driver.quit() df = pd.DataFrame(dict,columns=['article_title', 'authors', 'abstract','structs','tag']) df.to_excel(r"C:\\Users\\dell\\Desktop\\data collection\\myDataset.xlsx", sheet_name='Sheet1') driver.quit() 

but am getting the following error:

File "C:\Users\dell\miniconda3\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. [Done] exited with code=1 in 77.947 seconds 
3
  • 2
    Can you post the minimum example that can reproduce the error? This is most likely due to some code that's outside of the if __name__ == '__main__': block as the error suggests. Commented Aug 31, 2022 at 18:29
  • am not sure which part reproducing the error but this almost all my code the rest is just the initialisation of the webdriver and the other variables used Commented Aug 31, 2022 at 18:53
  • It's the initialization of the webdriver that is causing the problem. I assume you are using selenium and since selenium runs in its own process, you only need to be using a multithreading pool and each thread in the pool needs to initialize its onw selenium instance. Ideally, this webdriver is reusable by the thread for all the submitted tasks it will be processing. See this post and my answer that ensures that the drivers are properly terminated. Commented Sep 2, 2022 at 11:21

1 Answer 1

1

Driver starts child process when pool process is created

A bit of a shot in the dark. I'm guessing that driver starts it's own subprocess when the module is loaded. This tricks the pool sub-process into thinking you have setup your multiprocessing code incorrectly. You should initialized driver under if clause and pass driver as an argument to the pool process.

Sign up to request clarification or add additional context in comments.

4 Comments

OP's goal may be to use multiple instance of drivers to do the processing simultaneously. In that case, it may be better to initialize driver with the initializer option. Otherwise, the different actions each process takes may result in conflict in for the shared driver.
thanks the one driver per process seems interesting But doesnt work. I tried to instanciate the driver inside the function extract_tag_data but doesnt work. I also tried initializing the driver inside if main close but also doest solve the problem.
If you comment out the driver code, do you still get a RuntimeError?
Yes I tried a simple function doing just printing and I got the same error

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.