Wait for bash background jobs in script to be finished

Question

To maximize CPU usage (I run things on a Debian Lenny in EC2) I have a simple script to launch jobs in parallel:

#!/bin/bash for i in apache-200901*.log; do echo "Processing $i ..."; do_something_important; done & for i in apache-200902*.log; do echo "Processing $i ..."; do_something_important; done & for i in apache-200903*.log; do echo "Processing $i ..."; do_something_important; done & for i in apache-200904*.log; do echo "Processing $i ..."; do_something_important; done & ...

I'm quite satisfied with this working solution; however, I couldn't figure out how to write further code to be executed only once ALL of the loops have been completed.

Is there a way to do this?

Similar question: unix.stackexchange.com/questions/255647/… — Attila Csipak
– Attila Csipak, Commented Nov 11 at 10:19

muru · Accepted Answer · 2017-06-19 09:39:04Z

143

There's a bash builtin command for that.

wait [n ...] Wait for each specified process and return its termination sta‐ tus. Each n may be a process ID or a job specification; if a job spec is given, all processes in that job’s pipeline are waited for. If n is not given, all currently active child pro‐ cesses are waited for, and the return status is zero. If n specifies a non-existent process or job, the return status is 127. Otherwise, the return status is the exit status of the last process or job waited for.

edited Jun 19, 2017 at 9:39

muru

4,9811 gold badge38 silver badges84 bronze badges

answered Jul 15, 2009 at 13:48

eduffy

40.5k14 gold badges99 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

lambacck Over a year ago

hint use wait $(jobs -p) to wait for the newly created jobs.

Olivier Lalonde Over a year ago

@lambacck isn't wait with no argument equivalent?

Luc Over a year ago

Or use wait $(jobs -rp) if you have other jobs backgrounded (such as when you suspended vim with Ctrl+Z): the additional -r flag filters out running jobs.

shadowtalker Over a year ago

I know this is a Bash question, but in case anyone wants to know the Zsh equivalent, here it is. Zsh jobs doesn't have an equivalent -p option, so you can use AWK to parse the output. Something like this should work: wait $( jobs -r | awk '{ gsub("[\\[\\]]", "", "%" $1) ; print "%"$1 ; }' ).

Sebastian Over a year ago

this ZSH equivalent gives error:

awk: syntax error at source line 1 context is	{ gsub("[\\[\\]]", "", >>> "%" <<< awk: illegal statement at source line 1 awk: illegal statement at source line 1

Cœur · Accepted Answer · 2018-07-21 09:51:49Z

49

Using GNU Parallel will make your script even shorter and possibly more efficient:

parallel 'echo "Processing "{}" ..."; do_something_important {}' ::: apache-*.log

This will run one job per CPU core and continue to do that until all files are processed.

Your solution will basically split the jobs into groups before running. Here 32 jobs in 4 groups:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

To learn more:

Watch the intro video for a quick introduction: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). You command line will love you for it.

edited Jul 21, 2018 at 9:51

Cœur

39k25 gold badges207 silver badges282 bronze badges

answered Mar 28, 2014 at 19:31

Ole Tange

34.1k9 gold badges93 silver badges111 bronze badges

6 Comments

andras.tim Over a year ago

This parallel --citation is a bit weird

Ole Tange Over a year ago

@andras.tim This may help you: git.savannah.gnu.org/cgit/parallel.git/tree/doc/…

b-rad15 Over a year ago

While this is good for CPU intensive tasks, wouldn't this be add more waste during jobs that involve lots of idle time (like ones making web requests)

Ole Tange Over a year ago

@b-rad15 If you need to have, say, 250 slow web requests running in parallel, you will waste a little CPU time. But since this CPU would be sitting idle anyway, you are unlikely to notice the loss. The overhead is ~10 ms CPU time per job - which is noticeable for very short jobs, but not a problem for longer running jobs.

Ole Tange Over a year ago

@eri parallel ./worker job < /opt/joblist.txt. Spend 15 minutes on reading chapter 1+2 of zenodo.org/record/1146014 Your command line will thank you for it.

|

Olivier Lalonde · Accepted Answer · 2017-05-04 07:49:50Z

I had to do this recently and ended up with the following solution:

while true; do wait -n || { code="$?" ([[ $code = "127" ]] && exit 0 || exit "$code") break } done;

Here's how it works:

wait -n exits as soon as one of the (potentially many) background jobs exits. It always evaluates to true and the loop goes on until:

Exit code 127: the last background job successfully exited. In that case, we ignore the exit code and exit the sub-shell with code 0.
Any of the background job failed. We just exit the sub-shell with that exit code.

With set -e, this will guarantee that the script will terminate early and pass through the exit code of any failed background job.

schnatterer · Accepted Answer · 2022-09-06 10:23:37Z

A minimal example with wait $(jobs -p):

 for i in {1..3} do (echo "process $i started" && sleep 5 && echo "process $i finished")& done sleep 0.1 # For sequential output echo "Waiting for processes to finish" wait $(jobs -p) echo "All processes finished"

Exemplary output:

process 1 started process 2 started process 3 started Waiting for processes to finish process 2 finished process 1 finished process 3 finished All processes finished

Sudip Bhattarai · Accepted Answer · 2023-11-05 04:34:50Z

If you just want to wait for all the jobs and return, use the following one-liner.

while wait -n; do : ; done; # wait until it's possible to wait for bg job

N.B. wait returns as soon as any one of several jobs is complete

muru · Accepted Answer · 2017-06-19 09:39:53Z

This is my crude solution:

function run_task { cmd=$1 output=$2 concurency=$3 if [ -f ${output}.done ]; then # experiment already run echo "Command already run: $cmd. Found output $output" return fi count=`jobs -p | wc -l` echo "New active task #$count: $cmd > $output" $cmd > $output && touch $output.done & stop=$(($count >= $concurency)) while [ $stop -eq 1 ]; do echo "Waiting for $count worker threads..." sleep 1 count=`jobs -p | wc -l` stop=$(($count > $concurency)) done }

The idea is to use "jobs" to see how many children are active in the background and wait till this number drops (a child exits). Once a child exists, the next task can be started.

As you can see, there is also a bit of extra logic to avoid running the same experiments/commands multiple times. It does the job for me.. However, this logic could be either skipped or further improved (e.g., check for file creation timestamps, input parameters, etc.).

Collectives™ on Stack Overflow

Wait for bash background jobs in script to be finished

6 Answers 6

5 Comments

6 Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

5 Comments

6 Comments

Comments

Comments

Comments

Comments

Linked

Related