How to replace the character in one column at the position indicated by another column with the string in a different column

Question

I am rather new to bioinformatics (this is my first post!) and I would appreciate some help on task that has me stuck.

I have a Tab-delimited data table with three columns:

AATTCTTGCA 4 [A/T] AATTCCTTCG 7 [C/T] AATTCAACAA 2 [T/C]

I would like to replace the character in the first column at the position indicated by the second column with the string in the third column so that the output is:

AAT[A/T]CTTGCA AATTCC[C/T]TCG A[T/C]TTCAACAA

I am working through various tutorials now and will update my post when I have some (failed) commands with sed/awk.

Thanks in advance!

AdminBee · Accepted Answer · 2022-04-28 07:27:53Z

The following awk command should do the task:

awk -F"\t" '{printf "%s%s%s%s",substr($1,1,$2-1),$3,substr($1,$2+1),ORS}' input.txt

The option -F sets the field separator to TAB. The program will then print (using the printf() function) for every line

the substring of field 1 from the beginning up to (but excluding) the character position indicated in field 2
the string contained in field 3
the remainder of field 1, starting one past the character position indicated in field 2
the "output record separator", which defaults to new-line

thereby effectively replacing the indicated character with the content of field 3.

Note that in hindsight this amount of explicit formatting control is actually not necessary, and the program can be abbreviated to

awk -F"\t" '{print substr($1,1,$2-1) $3 substr($1,$2+1)}' input.txt

Caveat: The program assumes that the character position in field 2 is always reasonable, i.e. greater than 0 and less or equal to the total length of field 1. If the file can be corrupt, more error-checking is needed.

jubilatious1 · Accepted Answer · 2022-06-16 04:12:39Z

Using Raku (formerly known as Perl_6)

raku -ne 'my ($a,$b,$c) = .split("\t"); substr-rw($a, $b-1, 1) = $c; put $a;'

Sample Input:

AATTCTTGCA 4 [A/T] AATTCCTTCG 7 [C/T] AATTCAACAA 2 [T/C]

Sample Output:

AAT[A/T]CTTGCA AATTCC[C/T]TCG A[T/C]TTCAACAA

Briefly, data is read-in linewise using the -ne command-line flags. Each line is split on tabs, and assigned to scalars $a, $b, and $c. The substr-rw "substring-rewrite" command is used to take $a nucleotide sequence and assign $c string into a position defined by $b-1 of length 1 (i.e. replacing 1 nucleotide). The revised $a nucleotide sequence is then output.

https://docs.raku.org/routine/substr-rw
https://raku.org

Stack Exchange Network

How to replace the character in one column at the position indicated by another column with the string in a different column

2 Answers 2

You must log in to answer this question.

Hot Network Questions

How to replace the character in one column at the position indicated by another column with the string in a different column

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions