How would I use GNU Parallel for this while loop?

Question

So I have a while loop:

cat live_hosts | while read host; do \ sortstuff.sh -a "$host" > sortedstuff-"$host"; done

But this can take a long time. How would I use GNU Parallel for this while loop?

don_crissti · Accepted Answer · 2015-09-11 18:38:40Z

You don't use a while loop.

parallel "sortstuff.sh -a {} > sortedstuff-{}" <live_hosts

Note that this won't work if you have paths in your live_hosts (e.g. /some/dir/file) as it would expand to sortstuff.sh -a /some/dir/file > sortedstuff-/some/dir/file (resulting in no such file or directory); for those cases use {//} and {/} (see gnu-parallel manual for details):

parallel "sortstuff.sh -a {} > {//}/sortedstuff-{/}" <live_hosts

Is it possible to use tee with parallel when putting the output into sortedstuff? So I can see the output as it goes. — Proletariat
– Proletariat, Commented Sep 22, 2015 at 14:16
@Proletariat - you want to output to terminal too ? Just replace > with | tee e.g. the first command becomes parallel "sortstuff.sh -a {} | tee sortedstuff-{}" <live_hosts — don_crissti
– don_crissti, Commented Sep 22, 2015 at 15:10

Warren Young · Accepted Answer · 2015-09-11 20:44:35Z

As an old-school "do one thing and do it well" Unix guy, I'd put the string substitution stuff into a wrapper script:

#!/bin/sh sortstuff.sh -a "$1" > sortedstuff-"$1"

If you call it wrapper.sh, the parallel command to call it would be:

parallel wrapper.sh < live_hosts

Note that you don't need cat for this kind of thing, which saves an external program invocation.

chepner · Accepted Answer · 2015-09-11 17:47:17Z

You don't need parallel, since the body of the loop doesn't depend on previous iterations. Just start a new background process for each host.

while read host; do sortstuff.sh -a "$host" > sortedstuff-"$host" & done < live_hosts wait # Optional, to block until the background tasks are done

parallel does make it easier to manage certain aspects, though; you can limit the number of jobs running in parallel more easily.

If wc -l live_hosts is more than the number of disk spindles or CPU cores — depending on whether the task is I/O or CPU-bound — you're going to eat up a lot of the advantage you get from parallelism with a solution like that. The ability of parallel to limit the number of jobs isn't just nice, it's near-essential, if processing speed is your goal. — Warren Young
– Warren Young, Commented Sep 11, 2015 at 20:47

Stack Exchange Network

How would I use GNU Parallel for this while loop?

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

How would I use GNU Parallel for this while loop?

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions