Using grep to identify a pattern

Question

I have several documents hosted on a cloud instance. I want to extract all words conforming to a specific pattern into a .txt file. This is the pattern:

ABC123A ABC123B ABC765A

and so one. Essentially the words start with a specific character string 'ABC', have a fixed number of numerals, and end with a letter. This is my code:

grep -oh ABC[0-9].*[a-zA-Z]$ > /home/user/abcLetterMatches.txt

When I execute the query, it runs for several hours without generating any output. I have over 1100 documents. However, when I run this query:

grep -r ABC[0-9].*[a-zA-Z]$ > /home/user/abcLetterMatches.txt

the list of files with the strings is generated in a matter for seconds.

What do I need to correct in my query? Also, what is causing the delay?

UPDATE 1

Based on the answers, it's evident that the command is missing the file name on which it needs to be executed. I want to run the code on multiple document files (>1000)

The documents I want searched are in multiple sub-directories within a directory. What is a good way to search through them? Doing

grep -roh ABC[0-9].*[a-zA-Z]$ > /home/user/abcLetterMatches.txt

only returns the file names.

UPDATE 2

If I use the updated code from the answer below:

find . -exec grep -oh "ABC[0-9].*[a-zA-Z]$" >> ~/abcLetterMatches.txt {} \;

I get a no file or directory error

UPDATE 3

The pattern can be anywhere in the line.

‘runs for several hours without generating any output’ That's because it's waiting for input. You didn't tell grep where to look, so it's reading STDIN. You'll want to do grep <pattern> <file>. — Biffen
– Biffen, Commented Sep 9, 2016 at 20:58
Your pattern will match things that aren't like your examples, e.g. ABC1fooA. Your pattern just requires a single digit after ABC, then anything`. — Barmar
– Barmar, Commented Sep 9, 2016 at 21:02
Your pattern will extract ABC123A from patterns like 356XYZABC123A. Is this intended? — alvits
– alvits, Commented Sep 9, 2016 at 21:12

chenchuk · Accepted Answer · 2016-09-09 21:12:50Z

1

You can use this regexp :

~/ grep -E "^ABC[0-9]{3}[A-Z]$" docs > filename ABC123A ABC123B ABC765A

edited Sep 9, 2016 at 21:12

answered Sep 9, 2016 at 21:00

chenchuk

5,8124 gold badges37 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

alvits Over a year ago

Will match everything that contains your pattern such as XYZABC123A2356fghf65.

syntagma · Accepted Answer · 2016-09-09 21:39:12Z

There is no delay, grep is just waiting for the input you didn't give it (and therefore it waits on standard input, by default). You can correct your command by supplying argument with filename:

grep -oh "ABC[0-9].*[a-zA-Z]$" file.txt > /home/user/abcLetterMatches.txt

Source (man grep):

SYNOPSIS grep [OPTIONS] PATTERN [FILE...]

To perform the same grepping on several files recursively, combine it with find command:

find . -exec grep -oh "ABC[0-9].*[a-zA-Z]$" >> ~/abcLetterMatches.txt {} \;

I see. I want to run the command on a bunch of directories that have the documents. Checking by individual file is not a feasible option given that there are so many of them. I'm editing the question to include this information.
I tried -roh. it writes the file name, not the exact matches

Biffen · Accepted Answer · 2016-09-10 14:56:02Z

This does what you ask for:

grep -hr '^ABC[0-9]\{3\}[A-Za-z]$'

-h to not get the filenames.
-r to search recursively. If no directory is given (as above) the current one is used. Otherwise just specify one as the last argument.
Quotes around the pattern to avoid accidental globbing, etc.
^ at the beginning of the pattern to — together with $ at the end — only match whole lines. (Not sure if this was a requirement, but the sample data suggests it.)
\{3\} to specify that there should be three digits.
No .* as that would match a whole lot of other things.

the pattern can be anywhere in the line. updated the question accordingly. it will be great if you can edit your answer accordingly.

Collectives™ on Stack Overflow

Using grep to identify a pattern

3 Answers 3

1 Comment

6 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

6 Comments

1 Comment

Related