Unix command to check if any two lines in a file are same?

Question

Is there a unix command that can check if any two lines in a file are the same?

For e.g. Consider a file sentences.txt

This is sentence X This is sentence Y This is sentence Z This is sentence X This is sentence A This is sentence B

We see that the sentence

This is sentence X

is repeated.

Is there any command that can quickly detect this, so that I can perhaps execute it like this -

$ cat sentences.txt | thecommand Line 1:This is sentence X Line 4:This is sentence X

grebneke · Accepted Answer · 2014-02-05 19:29:58Z

40

Here is one way to get the exact output you're looking for:

$ grep -nFx "$(sort sentences.txt | uniq -d)" sentences.txt 1:This is sentence X 4:This is sentence X

Explanation:

The inner $(sort sentences.txt | uniq -d) lists each line that occurs more than once. The outer grep -nFx looks again in sentences.txt for exact -x matches to any of these lines -F and prepends their line number -n

edited Feb 5, 2014 at 19:29

answered Feb 5, 2014 at 18:20

grebneke

4,82127 silver badges20 bronze badges

Your edit just barely beat me from posting the exact same answer. +1

casey
– casey

2014-02-05 18:40:31 +00:00
Commented Feb 5, 2014 at 18:40
So the $(command) syntax works as a kind of replacement?

CodeBlue
– CodeBlue

2014-02-05 19:27:40 +00:00
Commented Feb 5, 2014 at 19:27
2

@CodeBlue - yes. It's called Command Substitution

grebneke
– grebneke

2014-02-05 19:29:00 +00:00
Commented Feb 5, 2014 at 19:29
8

sort sentences.txt | uniq -d | grep -nFxf - sentences.txt would be a little more efficient and would avoid potential arg list too long problems.

Stéphane Chazelas
– Stéphane Chazelas

2014-02-06 09:34:55 +00:00
Commented Feb 6, 2014 at 9:34

Add a comment |

aularon · Accepted Answer · 2014-02-05 18:22:01Z

Not exactly what you want, but you can try combining sort and uniq -c -d:

aularon@aularon-laptop:~$ cat input This is sentence X This is sentence Y This is sentence Z This is sentence X This is sentence A This is sentence B aularon@aularon-laptop:~$ sort input | uniq -cd 2 This is sentence X aularon@aularon-laptop:~$

2 here is the number of duplications found for the line, from man uniq:

 -c, --count prefix lines by the number of occurrences -d, --repeated only print duplicate lines

don_crissti · Accepted Answer · 2016-11-04 18:26:32Z

IF the file contents fit in memory awk is good for this. The standard one-liner in comp.lang.awk (I can't search an instance from this machine but there's several every month) to just detect there is duplication is awk 'n[$0]++' which counts the occurrences of each line value and prints any occurrence(s) other than the first, because the default action is print $0.

To show all occurrences including the first, in your format, but possibly in mixed order when more than one value is duplicated, gets a little more finicky:

awk <sentences.txt ' !($0 in n) {n[$0]=NR;next} \ n[$0] {n[$0]=0; print "Line "n[$0]":"$0} \ {print "Line "NR":"$0} '

Shown in multiple lines for clarity, you usually run together in real use. If you do this often you can put the awk script in a file with awk -f, or of course the whole thing in a shell script. Like most simple awk this can be done very similarly with perl -n[a].

Stack Exchange Network

Unix command to check if any two lines in a file are same?

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Unix command to check if any two lines in a file are same?

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions