1

I am trying to understand how can I build a parallel computing pipeline for multiple subprocesses. As I see, each subprocess block waits for the previous code block to run, whereas I have a pipeline which does not have a dependency for the previous run, and it can be handled in parallel. I want to understand whether this is possible, and if so, a sample syntax for showing how to do that would be a great help! Thanks in advance.

import sys import os import subprocess subprocess.run("python pipelinecode1.py".split() + [run_date, this_wk, last_wk, prev_wk], shell=True) subprocess.run("python pipelinecode2.py".split() + [run_date, this_wk, last_wk, prev_wk], shell=True) subprocess.run("python pipelinecode3.py".split() + [run_date, this_wk, last_wk, prev_wk], shell=True) 

1 Answer 1

1

The MCVE as-is shows zero dependency on the python-interpreter, so the most efficient step for running a set of mutualy independent tasks ( not a pipeline, where one-step-after-another order of processing steps "forms" the "pipeline" ) is GNU parallel:

$ parallel python {} run_date this_wk last_wk prev_wk ::: pipelinecode1.py \ pipelinecode2.py \ pipelinecode3.py 

This way you do not waste CPU / cache resources and escape from the blocking and GIL-lock re-introduced re-[SERIAL]-isation of the code-execution without any add-on overhead costs.

For all configurables available read respective details in man parallel

Sign up to request clarification or add additional context in comments.

2 Comments

I am not familiar with the syntax you suggest, shall I run it on cmd ? I am using Atom as editor.
@CagdasKanar yes, this is the standard GNU package, install it, if it was not installed so far, read the man-page and feel free to configure any additional tricks available to run this from terminal and enjoy the powers thereof

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.