Return to Revisions

3 of 3

used `==` where i should have used `eq`. fixed.

edited Oct 7, 2015 at 3:52

84.7k
9
138
206

This perl script stores each log line that matches the criteria (an "image/", > 100000 bytes, referrer = '-') in a Hash of Arrays keyed by the IP address. At the end of the script, it prints out every array line for each IP address that has > 14 entries.

It uses a lot of memory, but not as much memory as it would if it stored every input line.

You could condense it into a one-liner but you'd just be making it unreadable/un-debugable for no good reason.

#! /usr/bin/perl use strict; my %LOGLINES = (); while (<>) { next unless (/\bimage\//); my @F=split("\t"); next unless ($F[10] eq '-'); next unless ($F[13] > 100000); push @{ $LOGLINES{$F[2]} }, $_; }; foreach my $key (sort keys %LOGLINES) { print @{ $LOGLINES{$key} } if (scalar @{ $LOGLINES{$key} } > 14); }

Note that perl arrays are zero-based, not 1-based. so field numbers are offset by -1 from what you specified.

Here's another version that doesn't use anywhere near as much memory because it only stores up to 15 lines for each IP address it sees, then it starts printing matching lines as it sees them. The disadvantage is that the output isn't sorted by IP address but that's easily solved by piping to sort -t $'\t' -k2.

#! /usr/bin/perl use strict; my %LOGLINES = (); my %count = (); while (<>) { next unless (/\bimage\//); my @F=split("\t"); next unless ($F[10] eq '-'); next unless ($F[13] > 12000); $count{ $F[2] }++; if ($count{ $F[2] } == 15) { print @{ $LOGLINES{$F[2]} }; # print all the log lines we've seen so far print $_; # print the current line } elsif ($count{ $F[2] } > 15) { print $_; # print the current line } else { push @{ $LOGLINES{$F[2]} }, $_; # store the log line for later use } };

answered Oct 7, 2015 at 2:55

cas

84.7k
9
138
206