So I have a while loop:
cat live_hosts | while read host; do \ sortstuff.sh -a "$host" > sortedstuff-"$host"; done But this can take a long time. How would I use GNU Parallel for this while loop?
You don't use a while loop.
parallel "sortstuff.sh -a {} > sortedstuff-{}" <live_hosts Note that this won't work if you have paths in your live_hosts (e.g. /some/dir/file) as it would expand to sortstuff.sh -a /some/dir/file > sortedstuff-/some/dir/file (resulting in no such file or directory); for those cases use {//} and {/} (see gnu-parallel manual for details):
parallel "sortstuff.sh -a {} > {//}/sortedstuff-{/}" <live_hosts tee with parallel when putting the output into sortedstuff? So I can see the output as it goes. > with | tee e.g. the first command becomes parallel "sortstuff.sh -a {} | tee sortedstuff-{}" <live_hosts As an old-school "do one thing and do it well" Unix guy, I'd put the string substitution stuff into a wrapper script:
#!/bin/sh sortstuff.sh -a "$1" > sortedstuff-"$1" If you call it wrapper.sh, the parallel command to call it would be:
parallel wrapper.sh < live_hosts Note that you don't need cat for this kind of thing, which saves an external program invocation.
You don't need parallel, since the body of the loop doesn't depend on previous iterations. Just start a new background process for each host.
while read host; do sortstuff.sh -a "$host" > sortedstuff-"$host" & done < live_hosts wait # Optional, to block until the background tasks are done parallel does make it easier to manage certain aspects, though; you can limit the number of jobs running in parallel more easily.
wc -l live_hosts is more than the number of disk spindles or CPU cores — depending on whether the task is I/O or CPU-bound — you're going to eat up a lot of the advantage you get from parallelism with a solution like that. The ability of parallel to limit the number of jobs isn't just nice, it's near-essential, if processing speed is your goal.