Bash script processing limited number of commands in parallel

Question

I have a bash script that looks like this:

#!/bin/bash wget LINK1 >/dev/null 2>&1 wget LINK2 >/dev/null 2>&1 wget LINK3 >/dev/null 2>&1 wget LINK4 >/dev/null 2>&1 # .. # .. wget LINK4000 >/dev/null 2>&1

But processing each line until the command is finished then moving to the next one is very time consuming, I want to process for instance 20 lines at once then when they're finished another 20 lines are processed.

I thought of wget LINK1 >/dev/null 2>&1 & to send the command to the background and carry on, but there are 4000 lines here this means I will have performance issues, not to mention being limited in how many processes I should start at the same time so this is not a good idea.

One solution that I'm thinking of right now is checking whether one of the commands is still running or not, for instance after 20 lines I can add this loop:

while [ $(ps -ef | grep KEYWORD | grep -v grep | wc -l) -gt 0 ]; do sleep 1 done

Of course in this case I will need to append & to the end of the line! But I'm feeling this is not the right way to do it.

So how do I actually group each 20 lines together and wait for them to finish before going to the next 20 lines, this script is dynamically generated so I can do whatever math I want on it while it's being generated, but it DOES NOT have to use wget, it was just an example so any solution that is wget specific is not gonna do me any good.

wait is the right answer here, but your while [ $(ps … would be much better written while pkill -0 $KEYWORD… – using proctools… that is, for legitimate reasons to check if a process with a specific name is still running. — kojiro
– kojiro, Commented Oct 23, 2013 at 13:46
I think this question should be re-opened. The "possible duplicate" QA is all about running a finite number of programs in parallel. Like 2-3 commands. This question, however, is focused on running commands in e.g. a loop. (see "but there are 4000 lines"). — VasiliNovikov
– VasiliNovikov, Commented Jan 11, 2018 at 19:01
@VasyaNovikov Have you read all the answers to both this question and the duplicate? Every single answer to this question here, can also be found in the answers to the duplicate question. That is precisely the definition of a duplicate question. It makes absolutely no difference whether or not you are running the commands in a loop. — robinCTS
– robinCTS, Commented Jan 11, 2018 at 23:08
@robinCTS there are intersections, but questions themselves are different. Also, 6 of the most popular answers on the linked QA deal with 2 processes only. — VasiliNovikov
– VasiliNovikov, Commented Jan 12, 2018 at 4:09
I recommend reopening this question because its answer is clearer, cleaner, better, and much more highly upvoted than the answer at the linked question, though it is three years more recent. — Dan Nissenbaum
– Dan Nissenbaum, Commented Apr 20, 2018 at 15:35

Augustin · Accepted Answer · 2018-08-24 17:17:56Z

Use the wait built-in:

process1 & process2 & process3 & process4 & wait process5 & process6 & process7 & process8 & wait

For the above example, 4 processes process1 ... process4 would be started in the background, and the shell would wait until those are completed before starting the next set.

From the GNU manual:

wait [jobspec or pid ...] 
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for. If a job spec is given, all processes in the job are waited for. If no arguments are given, all currently active child processes are waited for, and the return status is zero. If neither jobspec nor pid specifies an active child process of the shell, the return status is 127.

So basically i=0; waitevery=4; for link in "${links[@]}"; do wget "$link" & (( i++%waitevery==0 )) && wait; done >/dev/null 2>&1
Unless you're sure that each process will finish at the exact same time, this is a bad idea. You need to start up new jobs to keep the current total jobs at a certain cap .... parallel is the answer.
I've tried this but it seems that variable assignments done in one block are not available in the next block. Is this because they are separate processes? Is there a way to communicate the variables back to the main process?

choroba · Accepted Answer · 2013-10-23 13:38:57Z

97

See parallel. Its syntax is similar to xargs, but it runs the commands in parallel.

answered Oct 23, 2013 at 13:38

choroba

245k27 gold badges221 silver badges304 bronze badges

6 Comments

chepner Over a year ago

This is better than using wait, since it takes care of starting new jobs as old ones complete, instead of waiting for an entire batch to finish before starting the next.

Mr. Llama Over a year ago

For example, if you have the list of links in a file, you can do cat list_of_links.txt | parallel -j 4 wget {} which will keep four wgets running at a time.

anon Over a year ago

There is a new kid in town called pexec which is a replacement for parallel.

jterm Over a year ago

Providing an example would be more helpful

weiji14 Over a year ago

parallel --jobs 4 < list_of_commands.sh, where list_of_commands.sh is a file with a single command (e.g. wget LINK1, note without the &) on every line. May need to do CTRL+Z and bg after to leave it running in the background.

|

Vader B · Accepted Answer · 2016-06-27 06:41:12Z

73

In fact, xargs can run commands in parallel for you. There is a special -P max_procs command-line option for that. See man xargs.

answered Jun 27, 2016 at 6:41

Vader B

9717 silver badges10 bronze badges

3 Comments

Clay Over a year ago

+100 this is is great since it is built in and very simple to use and can be done in a one-liner

Marco Roy Over a year ago

Great to use for small containers, as no extra packages/dependencies are needed!

Marco Roy Over a year ago

See this question for examples: stackoverflow.com/questions/28357997/…

Binpix · Accepted Answer · 2013-10-23 13:41:18Z

You can run 20 processes and use the command:

wait

Your script will wait and continue when all your background jobs are finished.

Collectives™ on Stack Overflow

Bash script processing limited number of commands in parallel

4 Answers 4

4 Comments

6 Comments

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

6 Comments

3 Comments

Comments

Linked

Related