Grep files containing two or more occurrence of a specific string

Question

I need to find files where a specific string appears twice or more.

For example, for three files:

File 1:

Hello World!

File 2:

Hello World! Hello !

File 3:

Hello World! Hello Hello Again.

--

I want to grep Hello and only get files 2 & 3.

@Melanie Shebel - not really sure what you are looking for. It may be good to know if multiple matches in the same line should be considered or not, for example. — fedorqui
– fedorqui, Commented Oct 21, 2015 at 16:26
I have some files that contain "calculation completed" once and some that contain "calculation completed" twice. I need to pull a list of the files that contain the string twice. The strings appear on separate lines. — Melanie Shebel
– Melanie Shebel, Commented Oct 23, 2015 at 4:43
Then all of the answers below will work. What more do you need? — Hans Lub
– Hans Lub, Commented Oct 23, 2015 at 6:50
@MelanieShebel ok. Adding a bounty is nice, even though I guess you could have asked a new question to have more control over the possible solutions and desired output. — fedorqui
– fedorqui, Commented Oct 23, 2015 at 12:54

Benjamin Loison · Accepted Answer · 2025-03-31 08:23:11Z

40

+50

What about this:

grep -o -c Hello * | awk -F: '{if ($2 > 1){print $1}}'

edited Mar 31 at 8:23

Benjamin Loison

5,7514 gold badges20 silver badges37 bronze badges

answered May 30, 2014 at 18:40

John C

4,4262 gold badges21 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

bstar55 Over a year ago

This will tell us which files contain at least 2 lines containing the word 'Hello'. What if a file has the line Hello Hello World? It won't get listed.

Jotne Over a year ago

This should be ($2 > 1) or it will only print file with 3 or more hits.

Hubert Léveillé Gauvin Over a year ago

@bstar55 Sorry if that was ambiguous. The way the files are designed this issue isn't going to be a problem.

Dark Cadmium Orange Over a year ago

This could use some explanation. Does the -o flag actually do anything here?

Benjamin Loison · Accepted Answer · 2025-03-31 08:23:28Z

Since the question is tagged grep, here is a solution using only that utility and bash (no awk required):

#!/bin/bash for file in * do if [ "$(grep -c "Hello" "${file}")" -gt 1 ] then echo "${file}" fi done

Can be a one-liner:

for file in *; do if [ "$(grep -c "Hello" "${file}")" -gt 1 ]; then echo "${file}"; fi; done

###Explanation###

You can modify the for file in * statement with whatever shell expansion you want to get all the data files.
grep -c returns the number of lines that match the pattern, with multiple matches on a line still counting for just one matched line.
if [ ... -gt 1 ] test that more than one line is matched in the file. If so:
echo ${file} print the file name.

Jotne · Accepted Answer · 2014-05-30 19:54:28Z

This awk will print the file name of all files with 2 or more Hello

awk 'FNR==1 {if (a>1) print f;a=0} /Hello/ {a++} {f=FILENAME} END {if (a>1) print f}' * file2 file3

Benjamin Loison · Accepted Answer · 2025-03-31 08:23:43Z

What you need is a grep that can recognise patterns across line endings ("hello" followed by anything (possibly even line endings), followed by "hello")

As grep processes your files line by line, it is (by itself) not the right tool for the job - unless you manage to cram the whole file into one single line.

Now, that is easy, for example using the tr command, replacing line endings by spaces:

if cat $file | tr '\n' ' ' | grep -q 'hello.*hello' then echo "$file matches" fi

This is quite efficient, even on large files with many (say 100000) lines, and can be made even more efficient by calling grep with --max-count=1 , making it stop the search after a match has been found. It doesn't matter whether the two hellos are on the same line or not.

Benjamin Loison · Accepted Answer · 2025-03-31 08:23:55Z

After reading your question, I think you also want to find the case hello hello in one line. ( find files where a specific string appears twice or more.) so I come up with this one-liner:

awk -v p="hello" 'FNR==1{x=0}{x+=gsub(p,p);if(x>1){print FILENAME;nextfile}}' *

in the above line, p is the pattern you want to search
it will print the filename if the file contains the pattern two or more times. no matter they are in same or different lines
during the processing, after checking some line, if we had already found two or more pattern, print the filename and stop processing current file, take the next input file, if there still are. This is helpful if you have big files.

A little test:

kent$ head f* ==> f <== hello hello world ==> f2 <== hello ==> f3 <== hello hello SK-Arch 22:27:00 /tmp/test kent$ awk -v p="hello" 'FNR==1{x=0}{x+=gsub(p,p);if(x>1){print FILENAME;nextfile}}' f* f f3

Thanks @Kent ! In my specific example I'll never have the string twice in a row, but it's good to know.

Pere · Accepted Answer · 2017-08-04 21:30:56Z

1

Another way:

grep Hello * | cut -d: -f1 | uniq -d

Grep for lines containing 'Hello'; keep only the file names; print only the duplicates.

edited Aug 4, 2017 at 21:30

answered Aug 4, 2017 at 21:23

Pere

2,1193 gold badges33 silver badges59 bronze badges

1 Comment

F. Hauri - Give Up GitHub Over a year ago

First time I used -d switch of uniq command! Interresting!

Benjamin Loison · Accepted Answer · 2025-03-31 08:24:14Z

Piping to a scripting language might be overkill, but it's oftentimes much easier than just using awk

grep -rnc "Hello" . | ruby -ne 'file, count = $_.split(":"); puts "#{file}: #{count}" if count&.to_i >= 2'

So for your input, we get

$ grep -rnc "Hello" . | ruby -ne 'file, count = $_.split(":"); puts "#{file}: #{count}" if count&.to_i >= 2' ./2: 2 ./3: 3

Or to omit the count

grep -rnc "Hello" . | ruby -ne 'file, _ = $_.split(":"); puts file if count&.to_i >= 2'

Benjamin Loison · Accepted Answer · 2025-03-31 08:24:27Z

0

grep -c Hello * | egrep -v ':[01]$' | sed 's/:[0-9]*$//'

edited Mar 31 at 8:24

Benjamin Loison

5,7514 gold badges20 silver badges37 bronze badges

answered Feb 23, 2017 at 20:30

Chaim Geretz

8565 silver badges23 bronze badges

Collectives™ on Stack Overflow

Grep files containing two or more occurrence of a specific string

8 Answers 8

4 Comments

Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

4 Comments

Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

Related