Skip to main content
1 of 4
Jerry
  • 161
  • 1
  • 2
  • 9

Bash: Nested while loop to detect duplicates and number the duplicates

So I have a textfile with headers for genes and there are different gene sequences under the same species. So I have extracted the headers (headers.txt) and copied it into another file (uniqueheaders.txt). I removed all the duplicates in uniqueheaders.txt. I am trying to loop read a line of uniqueheaders.txt then loop read headers.txt to check for duplicates. The if statement detects the duplicate and increments a counter to append it to the header. This will number all the headers in headers.txt so I insert them back into my fasta file. my code is here:

while IFS= read -r uniqueline do counter=0 while IFS= read headline do if [ "$uniqueline" == "$headline" ] then let "counter++" #append counter to the headline variable to number it. sed "$headline s/$/$counter/" -i headers if done < headers.txt done < uniqueheaders.txt 

The issue is that the terminal keeps spitting out the error sed: -e expression #1, char 1: unknown command: 'M' and sed: -e expression #1, char 2: extra characters after command. Both files contain unique header names:

Mus musculus

Homo sapiens

Rattus norvegicus

How do I modify the sed to prevent this error? Is there a better way of doing this in bash?

Jerry
  • 161
  • 1
  • 2
  • 9