0

I am trying to make a simple command that can show me the duplicate data from one specific column and also give me the original line number.

Example of file:

JENNIE;30;DOCTOR;F SARA;26;POLICE;F EDWARD;32;TEACHER;M ROBERT;44;POLICE;M 

With the following command I will get the duplicates from column 3

cat FILE.txt |cut -d ";" -f3 |sort |uniq -d

The problem is that I need to get the original line number of the results.

My command shows:

POLICE POLICE 

And I want to get

2- POLICE 4- POLICE 
3
  • is the dash/hypen an important part of your output, or would you be happy with "2 POLICE" etc? Commented Apr 26, 2019 at 17:39
  • Is that really the output you get? I just get a single POLICE, and only if I sort first. Commented Apr 26, 2019 at 17:42
  • 1
    Umm... Your command outputs nothing as there are no consecutive lines that contain duplicated data in the 3rd column. Commented Apr 26, 2019 at 17:43

2 Answers 2

2

With GNU sort and GNU uniq, you could do:

$ <FILE.txt awk -F';' '{print NR"- "$3}' | sort -st' ' -k2 | uniq -Df1 2- POLICE 4- POLICE 

Lines are sorted first lexically on the text and then by number (-s preserves the original order for texts that sort the same). Add a | sort -n to sort by line number.

With awk alone:

awk -F';' '!x {c[$3]++}; x && c[$3] > 1 {print FNR"- "$3}' FILE.txt x=1 FILE.txt 
0

It seems unlikely that your current pipeline works in the way you claim but it does not with BSD or GNU tools. Not sure if you are using something different.

I was able to come up with the following loop to to accomplish what you are asking:

for prof in $(cut -d\; -f3 FILE.txt | sort | uniq -d); do awk -v pat="$prof" -F\; '$3 ~ pat{print NR"-",$3}' FILE.txt done 

This will produce a list of professions that appear more than once and then use awk to find each occurance of them in the file, printing the line number and profession name.

awk will set the profession gathered from the cut -d\; -f3 FILE.txt | sort | uniq -d pipeline to the pat parameter and then search the file for lines containing that pattern in the 3rd field (using ; as a field separator). For lines that match it will print the line number and the 3rd field (separated by a dash).

1
  • Wow, thats amazing but Im a dummy with awk, im not sure how its works haha Commented Apr 26, 2019 at 18:19

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.