17

I have a bash script similar to:

NUM_PROCS=$1 NUM_ITERS=$2 for ((i=0; i<$NUM_ITERS; i++)); do python foo.py $i arg2 & done 

What's the most straightforward way to limit the number of parallel processes to NUM_PROCS? I'm looking for a solution that doesn't require packages/installations/modules (like GNU Parallel) if possible.

When I tried Charles Duffy's latest approach, I got the following error from bash -x:

+ python run.py args 1 + python run.py ... 3 + python run.py ... 4 + python run.py ... 2 + read -r line + python run.py ... 1 + read -r line + python run.py ... 4 + read -r line + python run.py ... 2 + read -r line + python run.py ... 3 + read -r line + python run.py ... 0 + read -r line 

... continuing with other numbers between 0 and 5, until too many processes were started for the system to handle and the bash script was shut down.

15
  • 2
    Take a look at: GNU Parallel Commented Aug 4, 2016 at 18:04
  • See: Parallelize Bash Script with maximum number of processes or Bash: limit the number of concurrent jobs? Commented Aug 4, 2016 at 18:13
  • ...unfortunately, the accepted answer there (err, as-edited, on the first proposed duplicate) is pretty awful. Commented Aug 4, 2016 at 18:14
  • (btw, seq isn't a standardized command -- not part of bash, and not part of POSIX, so there's no reason to believe it'll be present or behave a particular way on any given operating system. And re: case for shell variables, keeping in mind that they share a namespace with environment variables, see fourth paragraph of pubs.opengroup.org/onlinepubs/009695399/basedefs/… for POSIX conventions). Commented Aug 4, 2016 at 18:42
  • 1
    wait -n was introduced in bash 4.3. Commented Aug 4, 2016 at 19:57

6 Answers 6

14

bash 4.4 will have an interesting new type of parameter expansion that simplifies Charles Duffy's answer.

#!/bin/bash num_procs=$1 num_iters=$2 num_jobs="\j" # The prompt escape for number of jobs currently running for ((i=0; i<num_iters; i++)); do while (( ${num_jobs@P} >= num_procs )); do wait -n done python foo.py "$i" arg2 & done 
Sign up to request clarification or add additional context in comments.

Comments

13

GNU, macOS/OSX, FreeBSD and NetBSD can all do this with xargs -P, no bash versions or package installs required. Here's 4 processes at a time:

printf "%s\0" {1..10} | xargs -0 -I @ -P 4 python foo.py @ arg2 

Comments

8

As a very simple implementation, depending on a version of bash new enough to have wait -n (to wait until only the next job exits, as opposed to waiting for all jobs):

#!/bin/bash # ^^^^ - NOT /bin/sh! num_procs=$1 num_iters=$2 declare -A pids=( ) for ((i=0; i<num_iters; i++)); do while (( ${#pids[@]} >= num_procs )); do wait -n for pid in "${!pids[@]}"; do kill -0 "$pid" &>/dev/null || unset "pids[$pid]" done done python foo.py "$i" arg2 & pids["$!"]=1 done 

If running on a shell without wait -n, one can (very inefficiently) replace it with a command such as sleep 0.2, to poll every 1/5th of a second.


Since you're actually reading input from a file, another approach is to start N subprocesses, each of processes only lines where (linenum % N == threadnum):

num_procs=$1 infile=$2 for ((i=0; i<num_procs; i++)); do ( while read -r line; do echo "Thread $i: processing $line" done < <(awk -v num_procs="$num_procs" -v i="$i" \ 'NR % num_procs == i { print }' <"$infile") ) & done wait # wait for all the $num_procs subprocesses to finish 

24 Comments

I tried both your earlier solution and this one. The first solution didn't parallelize at all (ran one process); this one ran all num_iters at once and then crashed the system.
What's the meaning of wait -n?
@tomas, wait -n waits only for a single process, as opposed for all background processes to exit.
Ahh. The advantage of read -a to read into an array, and then ${#array[@]} to test that array's length, is that unlike wc or tr, it's built into the shell itself -- the code in the first answer requires no external commands, whereas your pipeline has several mkfifo/fork/exec sequences required to execute. I'd have to repro the failure to speak to it.
@Amirmasudzarebidaki, ...that said, I replaced the first answer with an implementation that doesn't depend on process substitutions having access to the parent's job table -- some shell version without that property being the most obvious reason for the first implementation to fail.
|
3

A relatively simple way to accomplish this with only two additional lines of code. Explanation is inline.

NUM_PROCS=$1 NUM_ITERS=$2 for ((i=0; i<$NUM_ITERS; i++)); do python foo.py $i arg2 & let 'i>=NUM_PROCS' && wait -n # wait for one process at a time once we've spawned $NUM_PROC workers done wait # wait for all remaining workers 

2 Comments

But, is it possible to abort the command am I executing? I already searched SGNINT approaches but I don't find anything useful which I can apply to this approach, did you achieve it? Thanks.
@z3nth10n that's a more complex question that should be posted separately.
2

Are you aware that if you are allowed to write and run your own scripts, then you can also use GNU Parallel? In essence it is a Perl script in one single file.

From the README:

= Minimal installation =

If you just need parallel and do not have 'make' installed (maybe the system is old or Microsoft Windows):

wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel chmod 755 parallel cp parallel sem mv parallel sem dir-in-your-$PATH/bin/ 
seq $2 | parallel -j$1 python foo.py {} arg2 

parallel --embed (available since 20180322) even makes it possible to distribute GNU Parallel as part of a shell script (i.e. no extra files needed):

parallel --embed >newscript 

Then edit the end of newscript.

Comments

1

This isn't the simplest solution, but if your version of bash doesn't have "wait -n" and you don't want to use other programs like parallel, awk etc, here is a solution using while and for loops.

num_iters=10 total_threads=4 iter=1 while [[ "$iter" -lt "$num_iters" ]]; do iters_remainder=$(echo "(${num_iters}-${iter})+1" | bc) if [[ "$iters_remainder" -lt "$total_threads" ]]; then threads=$iters_remainder else threads=$total_threads fi for ((t=1; t<="$threads"; t++)); do ( # do stuff ) & ((++iter)) done wait done 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.