Checking for the existence of files against a list

Question

I have a "command" text file that issues a data file download command on each line. I send the command file to bash. However, a small percentage of the downloads fail. Here is the algorithm I use to find out what's missing:

After downloading, I go back through the command file and check if each download file exists.
If the download doesn't exist, I copy the command line into a new command file.
I am left with a new command file for the remaining downloads.

Here is the bash script I implemented the algorithm with:

 1 #!/bin/bash 2 while read line 3 do 4 for item in $line 5 do 6 if [[ $item == *out_fname* ]]; then 7 splitline=(${item//=/ }) 8 target_file=${splitline[1]} 9 if [ ! -f $target_file ]; then 10 echo $line >> stillneed.txt 11 fi 12 fi 13 done 14 done < "$@"

Question: This works well, but is there a better algorithm or implementation (maybe using something other than bash)? What I did was just have bash do what a human would have to do. But it seems Unix always has a better way of doing things...

If you give some examples of the line being read in, then we could give some more efficient algorithms. — Arcege
– Arcege, Commented Dec 30, 2011 at 4:32
as you deduced out_fname=myfile.txt is the relevant command-line argument — Pete
– Pete, Commented Dec 30, 2011 at 4:38

Arcege · Accepted Answer · 2011-12-30 04:31:53Z

It looks like you are looking for 'out_fname=', not just 'out_fname'.

I'd either do this in a mix of awk and shell or in python. In awk/shell:

awk '{for(i=0;i<NF;i++) {if (index($i, 'out_fname=')) {split($i,A,/=/);print A[i]}}' "$@" | while read filename; do if [ ! -f $filename ]; then echo $filename; fi done > stillneed.txt

In python:

import fileinput, os stillneed = open("stillneed.txt", "w") for line in fileinput.input(): for filename in [l.split('=')[1] for l in line.split() if l.find('out_fname=')!=-1]: if not os.path.exists(filename): print >>stillneed, filename

Samus_ · Accepted Answer · 2011-12-30 04:20:36Z

not sure if it'll help but I have a function to retry commands until they return a success:

retry () { local delay=1 n if ! [[ $1 = *[^0-9]* ]]; then # TODO allow delay=0 (prevents Ctrl-C) if (($1 > 0)); then delay=${1:1} fi shift fi # run command while ! "$@"; do echo "retrying in ${delay}s" for ((n=delay; n>0; n--)); do sleep 1 || return done done }; export -f retry

Patrick · Accepted Answer · 2011-12-30 11:43:04Z

Instead of checking what's missing after the initial download script has completed, consider adding some checks to said download script. I have not tested the following, I just wrote it off the top of my head:

cat files_to_download|while read file; do SUCCESS="False" while [[ $SUCCESS == "False" ]]; do wget $file; if [[ $? -eq 0 ]]; then SUCCESS="True" fi done done

Kevin · Accepted Answer · 2012-01-28 05:15:03Z

I would recommend echoing a copy of the line when the download fails, rather than reviewing the filenames by parsing each line in the file afterwards:

[[ -f $1 ]] || { echo "$1 not found" >&2; exit 1; } while read -r line; do $line || echo "$line" >> stillneed done < "$1"

That would be more efficient, and also means you don't need to worry about any odd filenames in the future (eg with spaces in.)

If you want to improve your existing method, you could use standard parameter expansion:

for f; do while read -r line; do for item in $line; do [[ $item = out_fname=* ]] || continue [[ -f ${item#out_fname=} ]] || echo "$line" break # assuming one fname per line done done < "$f" done > stillneed

..but consider what happens with: out_fname='foo bar.ext'. Also, bear in mind that this checks every line after the event, when we could just have checked whether the command worked at the time we ran it.

Opening stillneed once for the whole loop is more efficient; I didn't do that in the first snippet as we'd most likely want to see output from the download commands. Here there's only tests, no external commands running so it makes sense to open the file once. (Note using > will truncate the file at the start; I've used for f to allow more than one input file as positional parameters: adding the same above should be easy if you need it.)

The one thing I must stress is quoting: echo "$line" is very different from echo $line. In general, quote all parameter expansions (this includes variables) unless you know for sure that you want field-splitting to happen.

Stack Exchange Network

Checking for the existence of files against a list

4 Answers 4

You must log in to answer this question.

Hot Network Questions

Checking for the existence of files against a list

4 Answers 4

You must log in to answer this question.

Related

Hot Network Questions