0

I am rather new to bioinformatics (this is my first post!) and I would appreciate some help on task that has me stuck.

I have a Tab-delimited data table with three columns:

AATTCTTGCA 4 [A/T] AATTCCTTCG 7 [C/T] AATTCAACAA 2 [T/C] 

I would like to replace the character in the first column at the position indicated by the second column with the string in the third column so that the output is:

AAT[A/T]CTTGCA AATTCC[C/T]TCG A[T/C]TTCAACAA 

I am working through various tutorials now and will update my post when I have some (failed) commands with sed/awk.

Thanks in advance!

0

2 Answers 2

5

The following awk command should do the task:

awk -F"\t" '{printf "%s%s%s%s",substr($1,1,$2-1),$3,substr($1,$2+1),ORS}' input.txt 

The option -F sets the field separator to TAB. The program will then print (using the printf() function) for every line

  • the substring of field 1 from the beginning up to (but excluding) the character position indicated in field 2
  • the string contained in field 3
  • the remainder of field 1, starting one past the character position indicated in field 2
  • the "output record separator", which defaults to new-line

thereby effectively replacing the indicated character with the content of field 3.

Note that in hindsight this amount of explicit formatting control is actually not necessary, and the program can be abbreviated to

awk -F"\t" '{print substr($1,1,$2-1) $3 substr($1,$2+1)}' input.txt 

Caveat: The program assumes that the character position in field 2 is always reasonable, i.e. greater than 0 and less or equal to the total length of field 1. If the file can be corrupt, more error-checking is needed.

0
0

Using Raku (formerly known as Perl_6)

raku -ne 'my ($a,$b,$c) = .split("\t"); substr-rw($a, $b-1, 1) = $c; put $a;' 

Sample Input:

AATTCTTGCA 4 [A/T] AATTCCTTCG 7 [C/T] AATTCAACAA 2 [T/C] 

Sample Output:

AAT[A/T]CTTGCA AATTCC[C/T]TCG A[T/C]TTCAACAA 

Briefly, data is read-in linewise using the -ne command-line flags. Each line is split on tabs, and assigned to scalars $a, $b, and $c. The substr-rw "substring-rewrite" command is used to take $a nucleotide sequence and assign $c string into a position defined by $b-1 of length 1 (i.e. replacing 1 nucleotide). The revised $a nucleotide sequence is then output.

https://docs.raku.org/routine/substr-rw
https://raku.org

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.