1

I want to download a few thousand files one by one. The average size of each one is 5-10Mb. Each has a name of "name_{i}", where "i" is a counter. What's the easiest and best way to do that?

Note that the internet connection may be interrupted and I want to interrupt the process and continue it later. In those cases the next time I run the script or whatever it happens to be, it should take the last downloaded file and if needed re-download it.

1
  • @drewbenn, web page. no. Commented Oct 23, 2015 at 19:20

3 Answers 3

3

I believe that you can write a small shell script to do what you want. Use a for loop to go through the files, wget or similar to download and write the current file to a file from which you can read where you have been after an interruption.

Example:

if [ -f $FILE ] count=$(cat file) for i in {$COUNT ..5} do wget https://foo.bar/name_$i echo "$i" > $FILE done else for i in {1..5} do wget https://foo.bar/name_$i echo "$i" > $FILE done fi

That's just the basic idea, there a probably some smaller errors, but you get my idea I assume.

1
  • 2
    Also, if you're going with wget, you might want to use the -c (aka --continue) option. Commented Oct 23, 2015 at 20:59
1
BASE_URL='http://some.site.somewhere.com/some/path' LASTFILE='./countfile' last=1 [ -e "$LASTFILE" ] && last=$(cat "$LASTFILE") for i in $(eval {$last..1000}) ; do echo "$i" > "$LASTFILE" wget -c "$BASE_URL/name_\{$i\}" done 

You said that Each has a name of "name_{i}" - I'm not sure if that means the filename has {} curly brackets in them or not. If not, just remove the \{ and \} from the wget line above.

If the filenames have zero-padded numbers (e.g. 0005 rather than just 5), you can use seq instead of eval {$last..1000} like this:

for i in $(seq -w $last 1000); do ... done 
0

Have a look at lftp's mirror option:

mirror [OPTS] [source [target]]

 Mirror specified source directory to local target directory. If the target directory ends with a slash (except the root), the source base name is appended to target directory name. Source and/or target can be URLs pointing to directories. 

See http://lftp.yar.ru/lftp-man.html for additional details.

EDIT

From the manual:

lftp is a file transfer program that allows sophisticated FTP, HTTP and other connections to other hosts. If site is specified then lftp will connect to that site otherwise a connection has to be established with the open command.

 lftp can handle several file access methods - FTP, FTPS, HTTP, HTTPS, HFTP, FISH, SFTP and file (HTTPS and FTPS are only available when lftp is compiled with GNU TLS or OpenSSL library). 

lftp can be used to get files over HTTP. Try:

lftp -e "mirror -c" http://url 
1
  • 1
    I don't understand how it can help me, there's no directory on the server. Commented Oct 23, 2015 at 19:21

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.