0

I would like to download a bunch of data sets from 1981 to 2000 (20 years). Those are in every 10 minutes. I was trying to write a script which will call all times and download the data. But I am unable to complete it. I can't check the leap years and days in each month. My script is:

#!/bin/sh for yr in {1981..2000};do for mm in 01 02 03 04 05 06 07 08 09 10 11 12;do for dd in {1..31};do if [[ $dd -le 9 ]];then nn=0$dd;else nn=$dd;fi for tt in 00 10 20 30 40 50; do echo wget www.xyz.com/$yy/$mm/$nn/$tt.txt done; done; done; done 

How can I fix the problems of leap years, and days in the month generally?

1

3 Answers 3

3

You seem to have left out the hours.

Assuming you have GNU date, you can deal with it by using the date calculations. Do you have to worry about switches between winter and summer (standard and daylight saving) time? If so, there'll be some entertainment to be had with gaps of an hour in the spring and a period in the fall when the raw date/time values repeat.

$ /opt/gnu/bin/date -d '1981-01-01 00:00:00' +'%s %Y-%m-%d %H:%M:%S' 347184000 1981-01-01 00:00:00 $ /opt/gnu/bin/date -d '2000-12-31 23:50:00' +'%s %Y-%m-%d %H:%M:%S' 978335400 2000-12-31 23:50:00 $ 

That gives you start and end times in Unix timestamp notation (and in the US/Pacific time zone — adjust to suit your needs). You could then use a loop such as:

now=347184000 end=978335400 while [ "$now" -le "$end" ] do url=$(date -d "@$now" +'www.example.com/%y/%m/%d/%H/%M.txt') echo wget "$url" now=$(($now + 600)) done 

There are multiple ways of writing that. I've assumed that there's a directory of hourly files, and within that the 10-minute files, but you can tweak the format to suit your requirements. The use of @ in the -d is crucial.

You might prefer to use a scripting language such as Perl or Python instead of repeatedly invoking date as shown.

Note that you have a vast number of files to collect. With about 31 million seconds per year, and 600 seconds per 10 minute interval, you're looking at over 50,000 files per year for 20 years, or 1 million files in total. The target (victim) web site might not be happy with you running that flat out. You'd probably need to pace the retrieval operations — check their terms and conditions.

Sign up to request clarification or add additional context in comments.

1 Comment

Great job, smart and simple.
0

This is how it can be (please note that this leap year calculation is good until 2100 only):

#!/bin/sh for yr in {1981..2000};do for mm in 1 2 3 4 5 6 7 8 9 10 11 12;do for dd in {1..31};do if [[ $dd -eq 31 ]] && ( [[ $mm -eq 4 ]] || [[ $mm -eq 6 ]] || [[ $mm -eq 9 ]] || [[ $mm -eq 11 ]] ) then continue elif ( [[ $dd -gt 28 ]] && [[ $mm -eq 2 ]] && [[ $(( $yr % 4 )) -ne 0 ]] ) || ([[ $dd -gt 29 ]] && [[ $mm -eq 2 ]] ) then continue fi if [[ $mm -le 9 ]];then mon=0$mm;else mon=$mm;fi if [[ $dd -le 9 ]];then nn=0$dd;else nn=$dd;fi for tt in 00 10 20 30 40 50; do echo wget www.xyz.com/$yy/$mon/$nn/$tt.txt done; done; done; done 

2 Comments

Thank you very much for your help. This is working fine. But I forgot to include hour field. Will be it Okay if I insert for hr in {1..24};do immedeately after for dd in {1..31};do
@Kayan: Yes, you can add the hours in as another loop. You might want to look at using printf (the command) to get the leading zeros in the right places all at once: printf "www.example.com/%.2d/%.2d/%.2d/%.2d/%.2d.txt" $yr $mm $dd $hr $tt or thereabouts. You might need to think about using for hr in {0..23}; rather than {1..24}.
0

I would use something to figure out the leap years etc for me ie date. The following might give a hint on how to do this.

They way you're using wget means it's going to create a bunch of files with

"10.txt.1" "10.txt.2" "10.txt.3" "10.txt.4" "10.txt.5" 

This might be fine but if you want to put these in a directory on their own or to name the file as something else

#!/bin/bash #Jan 01 1980 COUNTER=347155200 while [ $COUNTER -lt 978263999 ]; do year=`date -r $COUNTER +"%y"`; month=`date -r $COUNTER +"%m"`; day=`date -r $COUNTER +"%d"`; hour=`date -r $COUNTER +"%H"`; min=`date -r $COUNTER +"%M"`; let COUNTER=COUNTER+600 url="www.xyz.com/$year/$month/$day/$hour/$min.txt"; dir="$year/$month/$day/$hour"; file="$year/$month/$day/$hour/$min.txt" mkdir -p $dir; wget "$url" $file; #Post process files here... done 

8 Comments

I just corrected my original to be ten minutes and not ten seconds.
It would be worth explaining where the 347155200 and 978263999 values come from (UTC time zone, by the looks of it). Also, why not get a single date command to format the whole lot in one invocation, rather than using 5 invocations per iteration.
Also, which variant of date uses -r to indicate the reference time? GNU date expects -r to be given a file name, and the modification time of that file controls the reference time. See my answer using -d "@$unixtime" to generate the reference time as of the time in the variable $unixtime.
@JonathanLeffler The above was my first stab at the answer. I did upvote yours because it's shorter etc. Sometimes a different way of looking at things is useful. Thanks for the downvote.
I don't down-vote very often — you can look at my record. I didn't down-vote. I only made some suggestions and asked some questions.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.