3

Suppose I have a file with the following text:

  1. Number_1
  2. Number_3
  3. Number_1
  4. Number_4

How can I use a regexp to print only one time every different Number_n? Using:

grep -oE "Number_\w+" 

Gives me back all the matches:

Number_1

Number_3

Number_1

Number_4

But I want the following output:

Number_1

Number_3

Number_4

2
  • 1
    Not quite clear. You just want the first match, from the first matching line? Or what? Commented Apr 4, 2017 at 21:55
  • @user9008, well, it seems your question wasn't interpreted as you meant it to be. I updated my answer. Commented Apr 6, 2017 at 8:02

5 Answers 5

6
grep -oE "Number_\w+" | sort -u 
0
2

(Oh okay, the edit changes the question a bit.)

The easy way to print only one copy of each output line is to pipe through sort -u (or sort | uniq), though that will obviously sort the output.

Other related solutions here: Printing unique lines


(The answer to what I originally thought the question was:)

To print only the first string that matches the regex, we can use grep -m1 ...:

-m NUM, --max-count=NUM Stop reading a file after NUM matching lines. 

If the matches are on different lines, that works directly, but if you have multiple matching strings on the same line, then with -o, they'll all be printed, so add something like | head -1.

0
$ awk '{print $NF}' file | sort -u Number_1 Number_3 Number_4 $ awk '{Arr[$NF]++}END{for(i in Arr)print i}' file Number_3 Number_4 Number_1 
0

Using Raku (formerly known as Perl_6)

~$ echo Number_1 Number_2 Number_1 | raku -e '.put for lines.comb(/ Number_ \d+ /).unique;' 

Outputs:

Number_1 Number_2 

Raku implements a comb function that takes a regex matcher. Think of comb as the opposite of split: you define a regex and Raku selects out those elements for you (i.e. breaking around the desired textual elements, quite the opposite of split).

Sample Input (from a file):

~$ cat file 1. Number_1 2. Number_3 3. Number_1 4. Number_4 

Sample Output #1 (use regex to drop line_numbers):

~$ raku -e '.put for lines.comb(/ Number_ \d+ /).unique;' file Number_1 Number_3 Number_4 

Sample Output #2 (use <()> capture-markers to identify-then-drop the "Number_" text):

~$ raku -e '.put for lines.comb(/ Number_ <(\d+)> /).unique;' file 1 3 4 

https://raku.org

0

Using jq:

$ cat file Number_1 Number_3 Number_1 Number_4 
$ jq -n -R -r '[inputs | select(test("^Number_\\d+$"))] | unique[]' file Number_1 Number_3 Number_4 

or, with the regular expression given on the command line,

$ jq -r -R -n --arg re '^Number_\d+$' '[inputs | select(test($re))] | unique[]' file Number_1 Number_3 Number_4 

This selects the lines matching the PCRE regular expression ^Number_\d+$. It then removes any duplicate matches while keeping only the first unique instance of each matching line.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.