Calculate max TPS from log file

Question

I have this bash script to compute the max tps from an application's log file. The script works but it takes several hours to run on files with millions of entries. The log entries have the following pattern:

2015-11-01 14:34:20,969 TRACE [Thread-2868] [TrafficLogger] service transaction data 2015-11-01 14:34:20,987 TRACE [Thread-2868] [TrafficLogger] service transaction data

The script has a loop to grep for all possible hour:minute:second combinations and count the matches for each, each time comparing to the previous highest count to update the peak TPS:

for h in {00..23}; do for m in {00..59}; do for s in {00..59}; do tps=$(grep -c "${h}:${m}:${s}" $log_file) if [ "$tps" -gt "$peak_tps" ]; then peak_tps=$tps fi done done done

This is the straightforward way to compute the max TPS, but I'm wondering if there's a way to optimize it maybe using some heuristics about the input: (1) the input file is sorted by the time stamp; (2) it only contains entries for one day (i.e. the first column is constant).

I've tried a couple of things: (1) Adding --mmap to grep; (2) pre-finding all the time stamps and only searching those:

for timestamp in $(awk '{print $2}' $log_file | cut -d \, -f 1 | sort -u); do tps=$(grep --mmap -c "${h}:${m}:${s}" $log_file) ...

neither has yielded much improvement. I'm sure this is a classic test question but I can't seem to find the answer. Can you guys help?

Regards!

I may not understand your problem, but is cut -c12-19 $log_file | uniq -c | sort -rn | head -1 what you're looking for? — Jeff Schaller
– Jeff Schaller ♦, Commented Nov 9, 2015 at 20:49
@JeffSchaller I think you want to switch the uniq and sort in your pipeline. uniq only combines (and counts) matching adjacent lines. cut -c12-19 $log_file | sort -rn | uniq -c | head -1 — David King
– David King, Commented Nov 9, 2015 at 20:54
I would normally sort before uniq, but I'm relying on the OP's claim that the input is sorted. — Jeff Schaller
– Jeff Schaller ♦, Commented Nov 9, 2015 at 20:57
Yes, the input is sorted on the second column (c12-c19), so it's OK to have uniq before sort in this case ...and yes, that solution works for me. I'm used to using sort -u and didn't know about the -c option for uniq. I tried it and it now takes minutes vs hours previously! Thanks so much! If you post the answer I'll select it. — Pancho
– Pancho, Commented Nov 10, 2015 at 16:56

Jeff Schaller · Accepted Answer · 2015-11-10 19:15:08Z

The following command snips out just the timestamp (characters 12-19) from the log file, then (since they are assumed to be sorted already), runs it through uniq -c which produces a count of each unique entry (timestamp). The rest of the pipeline just sorts the resulting output by the first (count) column highest-first and then displays the first result.

cut -c12-19 $log_file | uniq -c | sort -rn | head -1

Alternatively, do a ... sort -n | tail -1

Stack Exchange Network

Calculate max TPS from log file

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Calculate max TPS from log file

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions