How to print only unique matches from a regular expression?

Question

Suppose I have a file with the following text:

Number_1
Number_3
Number_1
Number_4

How can I use a regexp to print only one time every different Number_n? Using:

grep -oE "Number_\w+"

Gives me back all the matches:

Number_1

Number_3

Number_1

Number_4

But I want the following output:

Number_1

Number_3

Number_4

Not quite clear. You just want the first match, from the first matching line? Or what? — Wildcard
– Wildcard, Commented Apr 4, 2017 at 21:55
@user9008, well, it seems your question wasn't interpreted as you meant it to be. I updated my answer. — ilkkachu
– ilkkachu, Commented Apr 6, 2017 at 8:02

steve · Accepted Answer · 2017-04-04 22:20:26Z

6

grep -oE "Number_\w+" | sort -u

answered Apr 4, 2017 at 22:20

steve

22.4k5 gold badges53 silver badges79 bronze badges

Add a comment |

Community · Accepted Answer · 2017-04-13 12:37:04Z

(Oh okay, the edit changes the question a bit.)

The easy way to print only one copy of each output line is to pipe through sort -u (or sort | uniq), though that will obviously sort the output.

Other related solutions here: Printing unique lines

(The answer to what I originally thought the question was:)

To print only the first string that matches the regex, we can use grep -m1 ...:

-m NUM, --max-count=NUM Stop reading a file after NUM matching lines.

If the matches are on different lines, that works directly, but if you have multiple matching strings on the same line, then with -o, they'll all be printed, so add something like | head -1.

Kamaraj · Accepted Answer · 2017-04-06 08:58:38Z

$ awk '{print $NF}' file | sort -u Number_1 Number_3 Number_4 $ awk '{Arr[$NF]++}END{for(i in Arr)print i}' file Number_3 Number_4 Number_1

jubilatious1 · Accepted Answer · 2022-06-20 06:29:34Z

Using Raku (formerly known as Perl_6)

~$ echo Number_1 Number_2 Number_1 | raku -e '.put for lines.comb(/ Number_ \d+ /).unique;'

Outputs:

Number_1 Number_2

Raku implements a comb function that takes a regex matcher. Think of comb as the opposite of split: you define a regex and Raku selects out those elements for you (i.e. breaking around the desired textual elements, quite the opposite of split).

Sample Input (from a file):

~$ cat file 1. Number_1 2. Number_3 3. Number_1 4. Number_4

Sample Output #1 (use regex to drop line_numbers):

~$ raku -e '.put for lines.comb(/ Number_ \d+ /).unique;' file Number_1 Number_3 Number_4

Sample Output #2 (use <(…)> capture-markers to identify-then-drop the "Number_" text):

~$ raku -e '.put for lines.comb(/ Number_ <(\d+)> /).unique;' file 1 3 4

https://raku.org

Kusalananda · Accepted Answer · 2022-06-20 07:41:41Z

Using jq:

$ cat file Number_1 Number_3 Number_1 Number_4

$ jq -n -R -r '[inputs | select(test("^Number_\\d+$"))] | unique[]' file Number_1 Number_3 Number_4

or, with the regular expression given on the command line,

$ jq -r -R -n --arg re '^Number_\d+$' '[inputs | select(test($re))] | unique[]' file Number_1 Number_3 Number_4

This selects the lines matching the PCRE regular expression ^Number_\d+$. It then removes any duplicate matches while keeping only the first unique instance of each matching line.

Stack Exchange Network

How to print only unique matches from a regular expression?

5 Answers 5

You must log in to answer this question.

Linked

Hot Network Questions

How to print only unique matches from a regular expression?

5 Answers 5

You must log in to answer this question.

Linked

Related

Hot Network Questions