I was wondering how, if possible, I can create a simple job management in BASH to process several commands in parallel. That is, I have a big list of commands to run, and I'd like to have two of them running at any given time.
I know quite a bit about bash, so here are the requirements that make it tricky:
- The commands have variable running time so I can't just spawn 2, wait, and then continue with the next two. As soon as one command is done a next command must be run.
- The controlling process needs to know the exit code of each command so that it can keep a total of how many failed
I'm thinking somehow I can use trap but I don't see an easy way to get the exit value of a child inside the handler.
So, any ideas on how this can be done?
Well, here is some proof of concept code that should probably work, but it breaks bash: invalid command lines generated, hanging, and sometimes a core dump.
# need monitor mode for trap CHLD to work set -m # store the PIDs of the children being watched declare -a child_pids function child_done { echo "Child $1 result = $2" } function check_pid { # check if running kill -s 0 $1 if [ $? == 0 ]; then child_pids=("${child_pids[@]}" "$1") else wait $1 ret=$? child_done $1 $ret fi } # check by copying pids, clearing list and then checking each, check_pid # will add back to the list if it is still running function check_done { to_check=("${child_pids[@]}") child_pids=() for ((i=0;$i<${#to_check};i++)); do check_pid ${to_check[$i]} done } function run_command { "$@" & pid=$! # check this pid now (this will add to the child_pids list if still running) check_pid $pid } # run check on all pids anytime some child exits trap 'check_done' CHLD # test for ((tl=0;tl<10;tl++)); do run_command bash -c "echo FAIL; sleep 1; exit 1;" run_command bash -c "echo OKAY;" done # wait for all children to be done wait Note that this isn't what I ultimately want, but would be groundwork to getting what I want.
Followup: I've implemented a system to do this in Python. So anybody using Python for scripting can have the above functionality. Refer to shelljob