0

I have several documents hosted on a cloud instance. I want to extract all words conforming to a specific pattern into a .txt file. This is the pattern:

ABC123A ABC123B ABC765A 

and so one. Essentially the words start with a specific character string 'ABC', have a fixed number of numerals, and end with a letter. This is my code:

grep -oh ABC[0-9].*[a-zA-Z]$ > /home/user/abcLetterMatches.txt 

When I execute the query, it runs for several hours without generating any output. I have over 1100 documents. However, when I run this query:

grep -r ABC[0-9].*[a-zA-Z]$ > /home/user/abcLetterMatches.txt 

the list of files with the strings is generated in a matter for seconds.

What do I need to correct in my query? Also, what is causing the delay?

UPDATE 1

Based on the answers, it's evident that the command is missing the file name on which it needs to be executed. I want to run the code on multiple document files (>1000)

The documents I want searched are in multiple sub-directories within a directory. What is a good way to search through them? Doing

grep -roh ABC[0-9].*[a-zA-Z]$ > /home/user/abcLetterMatches.txt 

only returns the file names.

UPDATE 2

If I use the updated code from the answer below:

find . -exec grep -oh "ABC[0-9].*[a-zA-Z]$" >> ~/abcLetterMatches.txt {} \; 

I get a no file or directory error

UPDATE 3

The pattern can be anywhere in the line.

3
  • 2
    runs for several hours without generating any output’ That's because it's waiting for input. You didn't tell grep where to look, so it's reading STDIN. You'll want to do grep <pattern> <file>. Commented Sep 9, 2016 at 20:58
  • 2
    Your pattern will match things that aren't like your examples, e.g. ABC1fooA. Your pattern just requires a single digit after ABC, then anything`. Commented Sep 9, 2016 at 21:02
  • 2
    Your pattern will extract ABC123A from patterns like 356XYZABC123A. Is this intended? Commented Sep 9, 2016 at 21:12

3 Answers 3

1

You can use this regexp :

~/ grep -E "^ABC[0-9]{3}[A-Z]$" docs > filename ABC123A ABC123B ABC765A 
Sign up to request clarification or add additional context in comments.

1 Comment

Will match everything that contains your pattern such as XYZABC123A2356fghf65.
1

There is no delay, grep is just waiting for the input you didn't give it (and therefore it waits on standard input, by default). You can correct your command by supplying argument with filename:

grep -oh "ABC[0-9].*[a-zA-Z]$" file.txt > /home/user/abcLetterMatches.txt 

Source (man grep):

SYNOPSIS grep [OPTIONS] PATTERN [FILE...] 

To perform the same grepping on several files recursively, combine it with find command:

find . -exec grep -oh "ABC[0-9].*[a-zA-Z]$" >> ~/abcLetterMatches.txt {} \; 

6 Comments

Put the pattern in quotes.
@Barmar, just did it.
I see. I want to run the command on a bunch of directories that have the documents. Checking by individual file is not a feasible option given that there are so many of them. I'm editing the question to include this information.
@kurious I have added it to my answer.
I tried -roh. it writes the file name, not the exact matches
|
0

This does what you ask for:

grep -hr '^ABC[0-9]\{3\}[A-Za-z]$' 
  • -h to not get the filenames.
  • -r to search recursively. If no directory is given (as above) the current one is used. Otherwise just specify one as the last argument.
  • Quotes around the pattern to avoid accidental globbing, etc.
  • ^ at the beginning of the pattern to — together with $ at the end — only match whole lines. (Not sure if this was a requirement, but the sample data suggests it.)
  • \{3\} to specify that there should be three digits.
  • No .* as that would match a whole lot of other things.

1 Comment

the pattern can be anywhere in the line. updated the question accordingly. it will be great if you can edit your answer accordingly.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.