Counting the number of lines having a number greater than 100

Question

I have a file with many numbers in it (only numbers and each number is in one line). I want to find out the number of lines in which the number is greater than 100 (or infact anything else). How can I do that?

Community · Accepted Answer · 2017-04-13 12:36:41Z

Let's consider this test file:

$ cat myfile 98 99 100 101 102 103 104 105

Now, let's count the number of lines with a number greater than 100:

$ awk '$1>100{c++} END{print c+0}' myfile 5

How it works

$1>100{c++}

Every time that the number on the line is greater than 100, the variable c is incremented by 1.
END{print c+0}

After we have finished reading the file, the variable c is printed.

By adding 0 to c, we force awk to treat c like a number. If there were any lines with numbers >100, then c is already a number. If there were not, then c would be an empty (hat tip: iruvar). By adding zero to it, we change the empty string to a 0, giving a more correct output.

I would change the print c to print 0+c or even print +c so a sane value of 0 is printed when no line exists with a number greater than 100 — iruvar
– iruvar, Commented Sep 26, 2016 at 4:05
@iruvar Good point! Thanks. answer updated with +0 to force conversion to a number. — John1024
– John1024, Commented Sep 26, 2016 at 5:32

Sundeep · Accepted Answer · 2017-12-12 03:59:58Z

Similar solution with perl

$ seq 98 105 | perl -ne '$c++ if $_ > 100; END{print $c+0 ."\n"}' 5

Speed comparison: numbers reported for 3 consecutive runs

Random file:

$ perl -le 'print int(rand(200)) foreach (0..10000000)' > rand_numbers.txt $ perl -le 'print int(rand(100200)) foreach (0..10000000)' >> rand_numbers.txt $ shuf rand_numbers.txt -o rand_numbers.txt $ tail -5 rand_numbers.txt 114 100 66125 84281 144 $ wc rand_numbers.txt 20000002 20000002 93413515 rand_numbers.txt $ du -h rand_numbers.txt 90M rand_numbers.txt

With awk

$ time awk '$1>100{c++} END{print c+0}' rand_numbers.txt 14940305 real 0m7.754s real 0m8.150s real 0m7.439s

With perl

$ time perl -ne '$c++ if $_ > 100; END{print $c+0 ."\n"}' rand_numbers.txt 14940305 real 0m4.145s real 0m4.146s real 0m4.196s

And just for fun with grep (Updated: faster than even Perl with LC_ALL=C)

$ time grep -xcE '10[1-9]|1[1-9][0-9]|[2-9][0-9]{2,}|1[0-9]{3,}' rand_numbers.txt 14940305 real 0m10.622s $ time LC_ALL=C grep -xcE '10[1-9]|1[1-9][0-9]|[2-9][0-9]{2,}|1[0-9]{3,}' rand_numbers.txt 14940305 real 0m0.886s real 0m0.889s real 0m0.892s

sed is no fun:

$ time sed -nE '/^10[1-9]|1[1-9][0-9]|[2-9][0-9]{2,}|1[0-9]{3,}$/p' rand_numbers.txt | wc -l 14940305 real 0m11.929s $ time LC_ALL=C sed -nE '/^10[1-9]|1[1-9][0-9]|[2-9][0-9]{2,}|1[0-9]{3,}$/p' rand_numbers.txt | wc -l 14940305 real 0m6.238s

To be fair compare apples to apples: compare grep w/o -c piped through wc -l to the sed solution, but I expect sed would still be slower. — Dani_l
– Dani_l, Commented Dec 12, 2017 at 4:48
yeah, I had included sed only because it was tagged by OP.. sed isn't the tool to use for arithmetic.. and I was actually surprised when I checked grep + LC_ALL=C today which prompted the edit.. — Sundeep
– Sundeep, Commented Dec 12, 2017 at 5:05

Stack Exchange Network

Counting the number of lines having a number greater than 100

2 Answers 2

How it works

You must log in to answer this question.

Linked

Hot Network Questions

Counting the number of lines having a number greater than 100

2 Answers 2

How it works

You must log in to answer this question.

Linked

Related

Hot Network Questions