Find largest files from each date in a directory

Question

I have a directory similar to the following:

-rw-r--r-- 1 root root 223K Apr 28 14:25 2015.04.28_14.25 -rw-r--r-- 1 root root 253K Apr 28 14:55 2015.04.28_14.55 -rw-r--r-- 1 root root 276K Apr 28 15:25 2015.04.28_15.25 -rw-r--r-- 1 root root 254K Apr 28 15:55 2015.04.28_15.55 -rw-r--r-- 1 root root 122K Apr 29 09:08 2015.04.29_09.08 -rw-r--r-- 1 root root 127K Apr 29 09:38 2015.04.29_09.38 -rw-r--r-- 1 root root 67K Apr 29 11:43 2015.04.29_11.43 -rw-r--r-- 1 root root 137K May 1 12:13 2015.04.29_12.13 -rw-r--r-- 1 root root 125K May 1 12:43 2015.04.29_12.43 -rw-r--r-- 1 root root 165K May 1 13:13 2015.04.29_13.13 -rw-r--r-- 1 root root 110K May 1 13:43 2015.04.29_13.43

My question is, how would I find the largest file from each date?

For example, largest file from Apr 28, largest from Apr 29, May 1, etc.

OS info: Linux Kali 3.18.0-kali3-amd64 #1 SMP Debian 3.18.6-1~kali2 (2015-03-02) x86_64 GNU/Linux

Is it the date in the file name or the modification time that matters? — Gilles 'SO- stop being evil'
– Gilles 'SO- stop being evil', Commented Jun 3, 2015 at 21:36

jthill · Accepted Answer · 2015-06-03 16:27:22Z

5

On GNU/anything,

ls -l --time-style=+%s \ | awk '{$6 = int($6/86400); print}' \ | sort -nk6,6 -nrk5,5 \ | sort -sunk6,6

That will get you UTC boundaries, add your local time offset to the calc as needed,e.g. int(($6-7*3600)/86400) for -0700 midnight boundaries.

answered Jun 3, 2015 at 16:27

jthill

2,75014 silver badges16 bronze badges

Thanks. Marked as answer. Could you explain a bit more about what this is doing though?

Proletariat
– Proletariat

2015-06-04 08:20:28 +00:00
Commented Jun 4, 2015 at 8:20
1

size is the fifth field, date's the sixth, +%s is seconds since 1 Jan 1970 00:00 UTC, that /86400 is days since, so the first sort is by day and descending size and the second is "stable" -- give up speed to keep things in input order when you can, "unique" -- select only the first record for each key. First one it sees in input order for each day will be the largest.

jthill
– jthill

2015-06-04 09:22:14 +00:00
Commented Jun 4, 2015 at 9:22

Add a comment |

Janis · Accepted Answer · 2015-06-04 05:45:05Z

An approach based on stat to obtain the file information and awk to determine the maximum for each date:

stat -c $'%.10y\t%s\t%n' * | awk 'BEGIN { FS=OFS="\t" } s[$1]<$2 { s[$1]=$2 ; n[$1]=$3 } END { for (d in n) print d,s[d],n[d] | "sort" }'

The output will be a Tab separated list of (date, size, filename) tuples.

BeepBeep · Accepted Answer · 2015-06-03 15:59:16Z

I would script it around the use of 'ls' to get the files in size order (and then limit the result to get the largest.

For example, if you do 'ls -lS 2015.04.29*' you will get a list of the files in descending size order. Or 'ls -lS 2015.04.29*|head -1' should give you the largest.

From there you can strip out just the file name etc. depending on your need (as well as loop through all the dates found in the filenames to get the largest for all dates etc). Essentially just a for loop based on the dates and the ls commands to get the largest for each date.

Stack Exchange Network

Find largest files from each date in a directory

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Find largest files from each date in a directory

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions