How to remove the comma and print the entire row again for the words which are place after the comma

Question

File:

chr1_156186369 chr1_156186369_A_C,T A C,T 33150.29 1/2:0,4,6:10:88:272 chr19_27732257 chr19_27732257_G_C G C 262.29 1/2:1,10,7:18:99:414,167 chrM_2619 chrM_2619_A_G,T A G,T 33023.29 1/2:0,5,5:10:99:293,144,129 chr9_119375271 chr9_119375271_T_A,G T A,G 248.29 1/2:1,11,5:17:99:359,107,113

I need to remove the comma from column 2 and 4 only and print the entire row for the words which are place after the comma.

Expected output is:

chr1_156186369 chr1_156186369_A_C A C 33150.29 1/2:0,4,6:10:88:272 chr1_156186369 chr1_156186369_A_T A T 33150.29 1/2:0,4,6:10:88:272 chr19_27732257 chr19_27732257_G_C G C 262.29 1/2:1,10,7:18:99:414,167 chrM_2619 chrM_2619_A_G A G 33023.29 1/2:0,5,5:10:99:293,144,129 chrM_2619 chrM_2619_A_T A T 33023.29 1/2:0,5,5:10:99:293,144,129 chr9_119375271 chr9_119375271_T_A T A 248.29 1/2:1,11,5:17:99:359,107,113 chr9_119375271 chr9_119375271_T_G T G 248.29 1/2:1,11,5:17:99:359,107,113

I tried awk but not get any result, also I read the similar type of question here How to extract line from the file on specific condition

Would be nice to see what you tried.

Tigger
– Tigger

2016-11-07 10:21:52 +00:00
Commented Nov 7, 2016 at 10:21 — Tigger
– Tigger, Commented Nov 7, 2016 at 10:21

rudimeier · Accepted Answer · 2016-11-07 11:49:11Z

Using awk:

awk '{ split ($2,w2,","); split ($4,w4,","); for (i in w4) { print $1,substr(w2[1],0,length(w2[1])-length(w4[i])) w4[i],$3,w4[i],$5,$6; }}'

Note there is no error handling in case the values after comma are not equal for column 2 and 4.

Sundeep · Accepted Answer · 2016-11-07 13:38:40Z

With sed assuming the single character separated values like C,T are repeated

$ sed -E 's/^(.*)([A-Z]),([A-Z])(.*)\2,\3(.*)/\1\2\4\2\5\n\1\3\4\3\5/' ip.txt chr1_156186369 chr1_156186369_A_C A C 33150.29 1/2:0,4,6:10:88:272 chr1_156186369 chr1_156186369_A_T A T 33150.29 1/2:0,4,6:10:88:272 chr19_27732257 chr19_27732257_G_C G C 262.29 1/2:1,10,7:18:99:414,167 chrM_2619 chrM_2619_A_G A G 33023.29 1/2:0,5,5:10:99:293,144,129 chrM_2619 chrM_2619_A_T A T 33023.29 1/2:0,5,5:10:99:293,144,129 chr9_119375271 chr9_119375271_T_A T A 248.29 1/2:1,11,5:17:99:359,107,113 chr9_119375271 chr9_119375271_T_G T G 248.29 1/2:1,11,5:17:99:359,107,113

^(.*) starting text
([A-Z]),([A-Z]) comma separated single characters
(.*) text in between the repetition
\2,\3 match the comma separated single characters again
(.*) rest of line
\1\2\4\2\5\n\1\3\4\3\5 required output format
Note that spacing doesn't exactly match with expected output

your one line code is awesome, thanks for your help

Sunil Pachakar
– Sunil Pachakar

2016-11-08 08:08:34 +00:00
Commented Nov 8, 2016 at 8:08 — Sunil Pachakar
– Sunil Pachakar, Commented Nov 8, 2016 at 8:08

jordix · Accepted Answer · 2016-11-07 11:02:23Z

I don't know how to do it with a single command, but it works with this loop in bash:

cat data.dat | while read line do if echo "${line}" | grep -q '[[:alpha:]],[[:alpha:]]' then letters=`echo "${line}" | grep -o '[[:alpha:]],[[:alpha:]]' | head -n 1` for letter in `echo ${letters} | sed 's/,/ /g'` do echo "${line}" | sed 's/'"${letters}"'/'"${letter}"' /g' done else echo "${line}" fi done

fedorqui · Accepted Answer · 2016-11-07 11:15:19Z

Split the 4th field on the comma and use the slices in that column, as well as to replace the last _X,Y into _slice, if there are any:

awk '{ n=split($4,slices,",") for(i=1;i<=n;i++) { res=$2 sub(/.,.*/,slices[i],res) print $1, res, $3, slices[i], $5, $6 } }' file

I don't like very much how I print the fields, since I do indicate from the 1st to the 6th, so hopefully this is static.

$ awk '{n=split($4,slices,","); for(i=1;i<=n;i++) {res=$2; sub(/.,.*/,slices[i],res); print $1, res, $3, slices[i], $5, $6}}' a chr1_156186369 chr1_156186369_A_C A C 33150.29 1/2:0,4,6:10:88:272 chr1_156186369 chr1_156186369_A_T A T 33150.29 1/2:0,4,6:10:88:272 chr19_27732257 chr19_27732257_G_C G C 262.29 1/2:1,10,7:18:99:414,167 chrM_2619 chrM_2619_A_G A G 33023.29 1/2:0,5,5:10:99:293,144,129 chrM_2619 chrM_2619_A_T A T 33023.29 1/2:0,5,5:10:99:293,144,129 chr9_119375271 chr9_119375271_T_A T A 248.29 1/2:1,11,5:17:99:359,107,113 chr9_119375271 chr9_119375271_T_G T G 248.29 1/2:1,11,5:17:99:359,107,113

Stack Exchange Network

How to remove the comma and print the entire row again for the words which are place after the comma

4 Answers 4

You must log in to answer this question.

Linked

Hot Network Questions

How to remove the comma and print the entire row again for the words which are place after the comma

4 Answers 4

You must log in to answer this question.

Linked

Related

Hot Network Questions