Skip to main content
deleted 8 characters in body
Source Link
Jerry
  • 161
  • 1
  • 2
  • 9

So I have a textfile with headers for genes and there are different gene sequences under the same species. So I have extracted the headers (headers.txt) and copied it into another file (uniqueheaders.txt). I removed all the duplicates in uniqueheaders.txt.

I am trying to loop read a line of uniqueheaders.txt then loop read headers.txt to check for duplicates. The if statement detects the duplicate and increments a counter to append it to the header. This will number all the headers in headers.txt so I insert them back into my FASTA file. my code is here:

while IFS= read -r uniqueline do counter=0 while IFS= read headline do if [ "$uniqueline" == "$headline" ] then let "counter++" #append counter to the headline variable to number it. sed "$headline s/$/$counter/" -i headers if done < headers.txt done < uniqueheaders.txt 

The issue is that the terminal keeps spitting out the error

sed: -e expression #1, char 1: unknown command: 'M' 

and

sed: -e expression #1, char 2: extra characters after command 

Both files contain unique header names:

Mus musculus Homo sapiens Rattus norvegicus 

How do I modify the sed command to prevent this error? Is there a better way of doing this in bash?

Example of inputs (note that gene sequence don't really have a pattern in terms of how many lines it takes up) **** Gene sequences are all in one file

Mus musculus MDFJSGHDFSBGKJBDFSGKJBDFS NGBJDFSBGKJDFSHNGKJDFSGHG   Rattus norvegicus SNOFBDSFNLSFSFSFSJFJSDFSD   Mus musculus NJALDJASJDLAJSJAPOJPOASDJG DSFHBDSFHSDFHDFSHJDFSJKSSF 

Desired output:

Mus musculus1 MDFJSGHDFSBGKJBDFSGKJBDFS NGBJDFSBGKJDFSHNGKJDFSGHG   Rattus norvegicus SNOFBDSFNLSFSFSFSJFJSDFSD   Mus musculus2 NJALDJASJDLAJSJAPOJPOASDJG DSFHBDSFHSDFHDFSHJDFSJKSSF 

So I have a textfile with headers for genes and there are different gene sequences under the same species. So I have extracted the headers (headers.txt) and copied it into another file (uniqueheaders.txt). I removed all the duplicates in uniqueheaders.txt.

I am trying to loop read a line of uniqueheaders.txt then loop read headers.txt to check for duplicates. The if statement detects the duplicate and increments a counter to append it to the header. This will number all the headers in headers.txt so I insert them back into my FASTA file. my code is here:

while IFS= read -r uniqueline do counter=0 while IFS= read headline do if [ "$uniqueline" == "$headline" ] then let "counter++" #append counter to the headline variable to number it. sed "$headline s/$/$counter/" -i headers if done < headers.txt done < uniqueheaders.txt 

The issue is that the terminal keeps spitting out the error

sed: -e expression #1, char 1: unknown command: 'M' 

and

sed: -e expression #1, char 2: extra characters after command 

Both files contain unique header names:

Mus musculus Homo sapiens Rattus norvegicus 

How do I modify the sed command to prevent this error? Is there a better way of doing this in bash?

Example of inputs (note that gene sequence don't really have a pattern in terms of how many lines it takes up)

Mus musculus MDFJSGHDFSBGKJBDFSGKJBDFS NGBJDFSBGKJDFSHNGKJDFSGHG   Rattus norvegicus SNOFBDSFNLSFSFSFSJFJSDFSD   Mus musculus NJALDJASJDLAJSJAPOJPOASDJG DSFHBDSFHSDFHDFSHJDFSJKSSF 

Desired output:

Mus musculus1 MDFJSGHDFSBGKJBDFSGKJBDFS NGBJDFSBGKJDFSHNGKJDFSGHG   Rattus norvegicus SNOFBDSFNLSFSFSFSJFJSDFSD   Mus musculus2 NJALDJASJDLAJSJAPOJPOASDJG DSFHBDSFHSDFHDFSHJDFSJKSSF 

So I have a textfile with headers for genes and there are different gene sequences under the same species. So I have extracted the headers (headers.txt) and copied it into another file (uniqueheaders.txt). I removed all the duplicates in uniqueheaders.txt.

I am trying to loop read a line of uniqueheaders.txt then loop read headers.txt to check for duplicates. The if statement detects the duplicate and increments a counter to append it to the header. This will number all the headers in headers.txt so I insert them back into my FASTA file. my code is here:

while IFS= read -r uniqueline do counter=0 while IFS= read headline do if [ "$uniqueline" == "$headline" ] then let "counter++" #append counter to the headline variable to number it. sed "$headline s/$/$counter/" -i headers if done < headers.txt done < uniqueheaders.txt 

The issue is that the terminal keeps spitting out the error

sed: -e expression #1, char 1: unknown command: 'M' 

and

sed: -e expression #1, char 2: extra characters after command 

Both files contain unique header names:

Mus musculus Homo sapiens Rattus norvegicus 

How do I modify the sed command to prevent this error? Is there a better way of doing this in bash?

Example of inputs (note that gene sequence don't really have a pattern in terms of how many lines it takes up) **** Gene sequences are all in one file

Mus musculus MDFJSGHDFSBGKJBDFSGKJBDFS NGBJDFSBGKJDFSHNGKJDFSGHG Rattus norvegicus SNOFBDSFNLSFSFSFSJFJSDFSD Mus musculus NJALDJASJDLAJSJAPOJPOASDJG DSFHBDSFHSDFHDFSHJDFSJKSSF 

Desired output:

Mus musculus1 MDFJSGHDFSBGKJBDFSGKJBDFS NGBJDFSBGKJDFSHNGKJDFSGHG Rattus norvegicus SNOFBDSFNLSFSFSFSJFJSDFSD Mus musculus2 NJALDJASJDLAJSJAPOJPOASDJG DSFHBDSFHSDFHDFSHJDFSJKSSF 
Formatting and tags
Source Link
AdminBee
  • 23.6k
  • 25
  • 55
  • 77

So I have a textfile with headers for genes and there are different gene sequences under the same species. So I have extracted the headers (headers.txt) and copied it into another file (uniqueheaders.txtuniqueheaders.txt). I removed all the duplicates in uniqueheaders.txtuniqueheaders.txt. I

I am trying to loop read a line of uniqueheaders.txtuniqueheaders.txt then loop read headers.txtheaders.txt to check for duplicates. The ifif statement detects the duplicate and increments a counter to append it to the header. This will number all the headers in headers.txtheaders.txt so I insert them back into my fastaFASTA file. my code is here:

while IFS= read -r uniqueline do counter=0 while IFS= read headline do if [ "$uniqueline" == "$headline" ] then let "counter++" #append counter to the headline variable to number it. sed "$headline s/$/$counter/" -i headers if done < headers.txt done < uniqueheaders.txt 

The issue is that the terminal keeps spitting out the error sed: -e expression #1, char 1: unknown command: 'M' and sed: -e expression #1, char 2: extra characters after command. Both files contain unique header names:

Mus musculus

sed: -e expression #1, char 1: unknown command: 'M' 

Homo sapiensand

sed: -e expression #1, char 2: extra characters after command 

Rattus norvegicusBoth files contain unique header names:

Mus musculus Homo sapiens Rattus norvegicus 

How do I modify the sedsed command to prevent this error? Is there a better way of doing this in bashbash?

Example of inputs (note that gene sequence don't really have a pattern in terms of how many lines it takes up)

Mus musculus

MDFJSGHDFSBGKJBDFSGKJBDFS

NGBJDFSBGKJDFSHNGKJDFSGHG

Rattus norvegicus

SNOFBDSFNLSFSFSFSJFJSDFSD

Mus musculus

NJALDJASJDLAJSJAPOJPOASDJG

DSFHBDSFHSDFHDFSHJDFSJKSSF

Mus musculus MDFJSGHDFSBGKJBDFSGKJBDFS NGBJDFSBGKJDFSHNGKJDFSGHG Rattus norvegicus SNOFBDSFNLSFSFSFSJFJSDFSD Mus musculus NJALDJASJDLAJSJAPOJPOASDJG DSFHBDSFHSDFHDFSHJDFSJKSSF 

Desired output:

Mus musculus1

MDFJSGHDFSBGKJBDFSGKJBDFS

NGBJDFSBGKJDFSHNGKJDFSGHG

Rattus norvegicus

SNOFBDSFNLSFSFSFSJFJSDFSD

Mus musculus2

NJALDJASJDLAJSJAPOJPOASDJG

DSFHBDSFHSDFHDFSHJDFSJKSSF

Mus musculus1 MDFJSGHDFSBGKJBDFSGKJBDFS NGBJDFSBGKJDFSHNGKJDFSGHG Rattus norvegicus SNOFBDSFNLSFSFSFSJFJSDFSD Mus musculus2 NJALDJASJDLAJSJAPOJPOASDJG DSFHBDSFHSDFHDFSHJDFSJKSSF 

So I have a textfile with headers for genes and there are different gene sequences under the same species. So I have extracted the headers (headers.txt) and copied it into another file (uniqueheaders.txt). I removed all the duplicates in uniqueheaders.txt. I am trying to loop read a line of uniqueheaders.txt then loop read headers.txt to check for duplicates. The if statement detects the duplicate and increments a counter to append it to the header. This will number all the headers in headers.txt so I insert them back into my fasta file. my code is here:

while IFS= read -r uniqueline do counter=0 while IFS= read headline do if [ "$uniqueline" == "$headline" ] then let "counter++" #append counter to the headline variable to number it. sed "$headline s/$/$counter/" -i headers if done < headers.txt done < uniqueheaders.txt 

The issue is that the terminal keeps spitting out the error sed: -e expression #1, char 1: unknown command: 'M' and sed: -e expression #1, char 2: extra characters after command. Both files contain unique header names:

Mus musculus

Homo sapiens

Rattus norvegicus

How do I modify the sed to prevent this error? Is there a better way of doing this in bash?

Example of inputs (note that gene sequence don't really have a pattern in terms of how many lines it takes up)

Mus musculus

MDFJSGHDFSBGKJBDFSGKJBDFS

NGBJDFSBGKJDFSHNGKJDFSGHG

Rattus norvegicus

SNOFBDSFNLSFSFSFSJFJSDFSD

Mus musculus

NJALDJASJDLAJSJAPOJPOASDJG

DSFHBDSFHSDFHDFSHJDFSJKSSF

Desired output:

Mus musculus1

MDFJSGHDFSBGKJBDFSGKJBDFS

NGBJDFSBGKJDFSHNGKJDFSGHG

Rattus norvegicus

SNOFBDSFNLSFSFSFSJFJSDFSD

Mus musculus2

NJALDJASJDLAJSJAPOJPOASDJG

DSFHBDSFHSDFHDFSHJDFSJKSSF

So I have a textfile with headers for genes and there are different gene sequences under the same species. So I have extracted the headers (headers.txt) and copied it into another file (uniqueheaders.txt). I removed all the duplicates in uniqueheaders.txt.

I am trying to loop read a line of uniqueheaders.txt then loop read headers.txt to check for duplicates. The if statement detects the duplicate and increments a counter to append it to the header. This will number all the headers in headers.txt so I insert them back into my FASTA file. my code is here:

while IFS= read -r uniqueline do counter=0 while IFS= read headline do if [ "$uniqueline" == "$headline" ] then let "counter++" #append counter to the headline variable to number it. sed "$headline s/$/$counter/" -i headers if done < headers.txt done < uniqueheaders.txt 

The issue is that the terminal keeps spitting out the error

sed: -e expression #1, char 1: unknown command: 'M' 

and

sed: -e expression #1, char 2: extra characters after command 

Both files contain unique header names:

Mus musculus Homo sapiens Rattus norvegicus 

How do I modify the sed command to prevent this error? Is there a better way of doing this in bash?

Example of inputs (note that gene sequence don't really have a pattern in terms of how many lines it takes up)

Mus musculus MDFJSGHDFSBGKJBDFSGKJBDFS NGBJDFSBGKJDFSHNGKJDFSGHG Rattus norvegicus SNOFBDSFNLSFSFSFSJFJSDFSD Mus musculus NJALDJASJDLAJSJAPOJPOASDJG DSFHBDSFHSDFHDFSHJDFSJKSSF 

Desired output:

Mus musculus1 MDFJSGHDFSBGKJBDFSGKJBDFS NGBJDFSBGKJDFSHNGKJDFSGHG Rattus norvegicus SNOFBDSFNLSFSFSFSJFJSDFSD Mus musculus2 NJALDJASJDLAJSJAPOJPOASDJG DSFHBDSFHSDFHDFSHJDFSJKSSF 
added 541 characters in body
Source Link
Jerry
  • 161
  • 1
  • 2
  • 9

So I have a textfile with headers for genes and there are different gene sequences under the same species. So I have extracted the headers (headers.txt) and copied it into another file (uniqueheaders.txt). I removed all the duplicates in uniqueheaders.txt. I am trying to loop read a line of uniqueheaders.txt then loop read headers.txt to check for duplicates. The if statement detects the duplicate and increments a counter to append it to the header. This will number all the headers in headers.txt so I insert them back into my fasta file. my code is here:

while IFS= read -r uniqueline do counter=0 while IFS= read headline do if [ "$uniqueline" == "$headline" ] then let "counter++" #append counter to the headline variable to number it. sed "$headline s/$/$counter/" -i headers if done < headers.txt done < uniqueheaders.txt 

The issue is that the terminal keeps spitting out the error sed: -e expression #1, char 1: unknown command: 'M' and sed: -e expression #1, char 2: extra characters after command. Both files contain unique header names:

Mus musculus

Homo sapiens

Rattus norvegicus

How do I modify the sed to prevent this error? Is there a better way of doing this in bash?

Example of inputs (note that gene sequence don't really have a pattern in terms of how many lines it takes up)

Mus musculus

MDFJSGHDFSBGKJBDFSGKJBDFS

NGBJDFSBGKJDFSHNGKJDFSGHG

Rattus norvegicus

SNOFBDSFNLSFSFSFSJFJSDFSD

Mus musculus

NJALDJASJDLAJSJAPOJPOASDJG

DSFHBDSFHSDFHDFSHJDFSJKSSF

Desired output:

Mus musculus1

MDFJSGHDFSBGKJBDFSGKJBDFS

NGBJDFSBGKJDFSHNGKJDFSGHG

Rattus norvegicus

SNOFBDSFNLSFSFSFSJFJSDFSD

Mus musculus2

NJALDJASJDLAJSJAPOJPOASDJG

DSFHBDSFHSDFHDFSHJDFSJKSSF

So I have a textfile with headers for genes and there are different gene sequences under the same species. So I have extracted the headers (headers.txt) and copied it into another file (uniqueheaders.txt). I removed all the duplicates in uniqueheaders.txt. I am trying to loop read a line of uniqueheaders.txt then loop read headers.txt to check for duplicates. The if statement detects the duplicate and increments a counter to append it to the header. This will number all the headers in headers.txt so I insert them back into my fasta file. my code is here:

while IFS= read -r uniqueline do counter=0 while IFS= read headline do if [ "$uniqueline" == "$headline" ] then let "counter++" #append counter to the headline variable to number it. sed "$headline s/$/$counter/" -i headers if done < headers.txt done < uniqueheaders.txt 

The issue is that the terminal keeps spitting out the error sed: -e expression #1, char 1: unknown command: 'M' and sed: -e expression #1, char 2: extra characters after command. Both files contain unique header names:

Mus musculus

Homo sapiens

Rattus norvegicus

How do I modify the sed to prevent this error? Is there a better way of doing this in bash?

So I have a textfile with headers for genes and there are different gene sequences under the same species. So I have extracted the headers (headers.txt) and copied it into another file (uniqueheaders.txt). I removed all the duplicates in uniqueheaders.txt. I am trying to loop read a line of uniqueheaders.txt then loop read headers.txt to check for duplicates. The if statement detects the duplicate and increments a counter to append it to the header. This will number all the headers in headers.txt so I insert them back into my fasta file. my code is here:

while IFS= read -r uniqueline do counter=0 while IFS= read headline do if [ "$uniqueline" == "$headline" ] then let "counter++" #append counter to the headline variable to number it. sed "$headline s/$/$counter/" -i headers if done < headers.txt done < uniqueheaders.txt 

The issue is that the terminal keeps spitting out the error sed: -e expression #1, char 1: unknown command: 'M' and sed: -e expression #1, char 2: extra characters after command. Both files contain unique header names:

Mus musculus

Homo sapiens

Rattus norvegicus

How do I modify the sed to prevent this error? Is there a better way of doing this in bash?

Example of inputs (note that gene sequence don't really have a pattern in terms of how many lines it takes up)

Mus musculus

MDFJSGHDFSBGKJBDFSGKJBDFS

NGBJDFSBGKJDFSHNGKJDFSGHG

Rattus norvegicus

SNOFBDSFNLSFSFSFSJFJSDFSD

Mus musculus

NJALDJASJDLAJSJAPOJPOASDJG

DSFHBDSFHSDFHDFSHJDFSJKSSF

Desired output:

Mus musculus1

MDFJSGHDFSBGKJBDFSGKJBDFS

NGBJDFSBGKJDFSHNGKJDFSGHG

Rattus norvegicus

SNOFBDSFNLSFSFSFSJFJSDFSD

Mus musculus2

NJALDJASJDLAJSJAPOJPOASDJG

DSFHBDSFHSDFHDFSHJDFSJKSSF

Source Link
Jerry
  • 161
  • 1
  • 2
  • 9
Loading