Counting matches with grep, when multiple matches per line are possible

Question

The standard usage of grep is to return lines that match a pattern.

If a line can contain several matches of the pattern, how can I count each match individually, not the total number of matches?

Does this answer your question? Count total number of occurrences using grep — G-Man Says 'Reinstate Monica'
– G-Man Says 'Reinstate Monica', Commented Oct 12, 2023 at 8:23
Also similar: Counting occurrences of [a] word in [a] text file. — G-Man Says 'Reinstate Monica'
– G-Man Says 'Reinstate Monica', Commented Oct 12, 2023 at 8:23
Note that despite OP's confusing edits, this question is asking the same thing: "what will isolate each match on a line of its own" is exactly what is done in "grep's -o will only output the matches" and is what's in the accepted answer as well. — muru
– muru, Commented Nov 6, 2023 at 8:03
And now OP has edited the question yet again to asking something else altogether. Please don't drastically change questions that have been answered multiple times. — muru
– muru, Commented Nov 7, 2023 at 9:16
Please don't change your question after receiving answers. You can clarify, but not completely change so that the answers are no longer relevant. I have rolled back your edits to the last version that seems to match the answers you have been given. If you have more questions, please ask them separately, as new questions. — terdon
– terdon ♦, Commented Nov 7, 2023 at 9:28

Kusalananda · Accepted Answer · 2023-10-06 18:20:03Z

The grep command has a -c option that counts the number of lines matched by a pattern. Since the standard usage of grep is to return lines that match a pattern, this solves the task "count the number of matches".

If a line can contain several matches of the pattern, you may use grep with its non-standard -o option if you want to count each match individually. This isolates each match on a line of its own. You may then count the number of matches by passing the result through wc -l. This uses wc to do the actual counting, not grep. However, you could cheat and use grep -c . in place of wc -l to count the number of non-empty lines returned from the first grep. Since that is a bit of a hack, and since wc -l does literally what we want, we'll use wc in the examples below.

See the manuals for grep and wc on your system.

Example: The number of lines matching the pattern G in file:

$ grep -c -e G file 7

Example: The number of matches in the same file, but counting each match individually:

$ grep -o -e G file | wc -l 18

Beware grep -o only prints the non-empty matches. For instance seq 10 | grep -c '^' prints 10 but seq 10 | grep -o '^' | wc -l prints 0. — Stéphane Chazelas
– Stéphane Chazelas, Commented Oct 6, 2023 at 21:17
There's also the usual question of whether overlapping matches should be counted (like if there are one or two occurrences of 99 in 999 or of aba in ababa). — Stéphane Chazelas
– Stéphane Chazelas, Commented Oct 6, 2023 at 21:23
A perhaps less trivial example: seq 10 | grep -c '7*' prints 10, but seq 10 | grep -o '7*' | wc -l prints 1. — G-Man Says 'Reinstate Monica'
– G-Man Says 'Reinstate Monica', Commented Oct 14, 2023 at 6:28

Prabhjot Singh · Accepted Answer · 2023-11-02 15:06:56Z

Using awk:

$ awk '{a += gsub(/pat/,"&"); } END{print a}' file

Or

$ awk '{for(i=1;i<=NF;i++)if ($i ~ /pat/) ++a}END{print a}'

The command is slightly changed for overlapping matching taken from this answer.

$ echo abababa | awk '{ while (a=index($0,"aba")) {++count; $0=substr($0,a+1)}}END{print count}'

Stéphane Chazelas · Accepted Answer · 2023-10-12 08:18:39Z

With perl, you could do:

perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='perl regex'

That has the advantage of also counting empty matches such as:

$ seq 10 perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='\b' 20

(20 word boundaries in the contents of the lines of the output of seq 10).

With perl regexps, you can also handle some cases of overlapping matches by using look-around operators:

$ echo abababa | perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='aba' 2

$ echo abababa | perl -lsne '$count++ while m{$regex}g; END{print +$count}' -- -regex='(?=aba)' 3

Which instead of matching on occurrences of aba, matches on the positions within the line where aba can be seen ahead.

Stack Exchange Network

Counting matches with grep, when multiple matches per line are possible

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

Counting matches with grep, when multiple matches per line are possible

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions