I want to find duplicates in a file and add a character to the end of the line on the 1st match

Question

I am trying to find duplicates in a file and once a match is found mark the 1st match with a character or word on the end of the line.

eg my file (test.html) contains the following entries

host= alpha-sfserver1 host= alphacrest3 host= alphacrest4 host= alphactn1 host= alphactn2 host= alphactn3 host= alphactn4 down alphacrest4

I can find the duplicate using the following:- (I use $2 as the duplicate will always be in column 2)

awk '{if (++dup[$2] == 1) print $0;}' test.html

It removed the last entry (down alphacrest4) but what I want is to also mark the duplicate entry with a word or character such as:-

host= alphacrest4 acked

Any help is most welcome.

just named it test.html. I should have called it test.txt for all the people who really care about its name. :-) — Sean
– Sean, Commented Jun 3, 2013 at 15:50

Hauke Laging · Accepted Answer · 2013-06-03 15:46:27Z

You need to process the file twice. In the first run you write the dupes into a file:

awk '{if (++dup[$2] == 1) print $2;}' test.html > dupes.txt

The second run compares all lines against the file contents:

awk 'BEGIN { while (getline var <"dupes.txt") { dup2[var]=1; }}; { num=++dup[$2] if (num == 1) { if (1 == dup2[$2]) print $0 " acked"; else print $0;} }' \ test.html

Hi Hauke, almost worked, just had to change the awk '{if (++dup[$2] == 1) print $2;}' test.html > dupes.txt to awk '{if (++dup[$2] == 2) print $2;}' test.html > dupes.txt — Sean
– Sean, Commented Jun 3, 2013 at 16:21
Sorry I should have said thanks for you help and very quick reply. — Sean
– Sean, Commented Jun 3, 2013 at 16:30

terdon · Accepted Answer · 2013-06-03 16:05:32Z

This would be much easier if we had the entire file. Are you only interested in lines beginning with host= or any of the 2nd fields? For a general solution, try this:

perl -e '@file=<>; foreach(map{/.+?\s+(.+)/;}@file){$dup{$_}++}; foreach(@file){ chomp; /.+?\s+(.+)/; if($dup{$1}>1 && not defined($p{$1})){ print "$_ acked\n"; $p{$1}++;} else{print "$_\n"} }' test.html

The script above will first read the entire file, check for duplicates and then print each duplicate line followed by "acked".

The whole thing is much simpler if we can assume you are only interested in lines starting with down X:

grep down test.html | awk '{printf $2}' | perl -e 'while(<>){$dup{$_}++}open(A,"test.html"); while(<A>){ if(/host=\s+(.+)/ && defined($dup{$1})){ chomp; print "$_ acked\n"} else{print}}'

jaypal singh · Accepted Answer · 2013-06-03 18:33:01Z

This could help:

One-Liner:

awk 'NR==FNR{b[$2]++; next} $2 in b { if (b[$2]>1) { print $0" acked" ; delete b[$2]} else print $0}' inputFile inputFile

Explaination:

awk ' NR==FNR { ## Loop through the file and check which line is repeated based on column 2 b[$2]++ ## Skip the rest of the actions until complete file is scanned next } ## Once the scan is complete, look for second column in the array $2 in b { ## If the count of the column is greater than 1 it means there is duplicate. if (b[$2]>1) { ## So print that line with "acked" marker print $0" acked" ## and delete the array so that it is not printed again delete b[$2] } ## If count is 1 it means there was no duplicate so print the line else print $0 }' inputFile inputFile

Stack Exchange Network

I want to find duplicates in a file and add a character to the end of the line on the 1st match

3 Answers 3

One-Liner:

Explaination:

You must log in to answer this question.

Linked

Hot Network Questions

I want to find duplicates in a file and add a character to the end of the line on the 1st match

3 Answers 3

One-Liner:

Explaination:

You must log in to answer this question.

Linked

Related

Hot Network Questions