5

I'm downloading a file and then unzipping it from a Bash script file.

#!/bin/sh wget -N http://example.com/file.zip unzip -o file.zip 

Is there a way to check if wget actually downloaded a new file? For instance, if the remote version of file.zip is the same as the local version it will not retrieve the file. I only want to unzip the file if wget actually retrieves a new file.

2
  • 2
    Checksum and compare with previous value (stored in a txt somewhere) Commented Sep 1, 2015 at 4:07
  • 1
    Both curl and wget can be told not to download a file if it hasn't changed. See stackoverflow.com/q/32322456/258523 for a recent question and answer about this for wget. Commented Sep 1, 2015 at 12:03

4 Answers 4

4

You should check return value and output from wget to figure out whether file has been downloaded:

out=$(wget -qN 'http://example.com/file.zip' 2>&1) [[ $? -eq 0 && $out ]] && unzip file.zip 

If file.zip is already there with same timestamp then wget will not download it and nothing will be written to stdout/stderr making out variable empty.

Sign up to request clarification or add additional context in comments.

2 Comments

I'm getting an error "[[: not found" on the last line. Also, is it possible to have it display the output when you assign it to out?
[[ not found means you're not using bash. Make sure to use bash when you run this. To display output you can add echo "$out" after first line.
2

Don't use the Last-Modified header, that's dependent on the server. Anubhava@'s also works but this is less overhead and slightly more portable between Bourne shell variations:

This gets you what you need:

wget -N http://example.com/file.zip 2>&1 | grep "not retrieving" 2>&1 > /dev/null || unzip file.zip 
  1. Get file
  2. Redirect stderr to stdout
  3. Check if "not retrieving" is in output (what wget prints when it's not downloading the file)
  4. If the "not retrieving" string does not exist in ouput, grep returns error code '1' and the file is unzipped. Otherwise, it just moves on silently.

It's essentially saying this, with more detail added for readability:

out=$(wget -N http://example.com/file.zip 2>&1) if [ $(echo $(out) | grep "not retrieving") ]; then echo "No new file; not unzipping" else unzip file.zip fi 

Comments

1

It's an old question but don't work anymore. I don't have any return when I set the quiet option of wget in the 2 case but we can have the HTML Code with the -S option of wget.

-200 if the file is downloaded

-304 if is the same file

-others... for all "bad" situations

a solution without change the @anubhava method's :

out=$(wget -SN 'http://example.com/file.zip' 2>&1 | grep "HTTP/" | awk '{print $2}') [[ $out -eq 200 ]] && unzip file.zip 

Comments

-1

You can use

curl -I http://example.com/file.zip 

and check Last-Modified: value.

You may also use wget --timestamping but requesting HEAD info you have more control.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.