How to split awk field correctly

Question

I have a file (test.bed) that looks like this (which might not be tab-seperated):

chr1 10002 10116 id=1;frame=0;strand=+; 0 + chr1 10116 10122 id=2;frame=0;strand=+; 0 + chr1 10122 10128 id=3;frame=0;strand=+; 0 + chr1 10128 10134 id=4;frame=0;strand=+; 0 + chr1 10134 10140 id=5;frame=0;strand=+; 0 + chr1 10140 10146 id=6;frame=0;strand=+; 0 + chr1 10146 10182 id=7;frame=0;strand=+; 0 + chr1 10182 10188 id=8;frame=0;strand=+; 0 + chr1 10188 10194 id=9;frame=0;strand=+; 0 + chr1 10194 10200 id=10;frame=0;strand=+; 0 +

I want to produce the following output (which should be tab-seperated):

chr1 10002 10116 id=1 0 + chr1 10116 10122 id=2 0 + chr1 10122 10128 id=3 0 + chr1 10128 10134 id=4 0 + chr1 10134 10140 id=5 0 + chr1 10140 10146 id=6 0 + chr1 10146 10182 id=7 0 + chr1 10182 10188 id=8 0 + chr1 10188 10194 id=9 0 + chr1 10194 10200 id=10 0 +

I have tried with the following code:

awk 'OFS="\t" split ($0, a, ";"){print a[1],$5,$6}' test.bed

But then I get:

chr1 10002 10116 id=1 40 4+ chr1 10116 10122 id=2 40 4+ chr1 10122 10128 id=3 40 4+ chr1 10128 10134 id=4 40 4+ chr1 10134 10140 id=5 40 4+ chr1 10140 10146 id=6 40 4+ chr1 10146 10182 id=7 40 4+ chr1 10182 10188 id=8 40 4+ chr1 10188 10194 id=9 40 4+ chr1 10194 10200 id=10 40 4+

What am I doing wrong? Somehow the number '4' is added to the last two fields. I thought the number '4' somehow might have something to do with splitting in the 4th field, however, I tried producing a similar file where it was the 3rd field that was split, and still got the number '4' added to the last two fields. I am rather new to 'awk' so I guess it is an error in the syntax. Any help would be appreciated.

try sed 's/;frame=0;strand=+;//'

kev
– kev

2013-05-14 09:21:55 +00:00
Commented May 14, 2013 at 9:21 — kev
– kev, Commented May 14, 2013 at 9:21

Chris Seymour · Accepted Answer · 2013-05-14 10:25:34Z

If you set your field separator as whitespace or semi-columns you won't have to handle the splitting yourself:

$ awk '{print $1,$2,$3,$4,$8,$9}' FS='[[:space:]]+|;' OFS='\t' file chr1 10002 10116 id=1 0 + chr1 10116 10122 id=2 0 + chr1 10122 10128 id=3 0 + chr1 10128 10134 id=4 0 + chr1 10134 10140 id=5 0 + chr1 10140 10146 id=6 0 + chr1 10146 10182 id=7 0 + chr1 10182 10188 id=8 0 + chr1 10188 10194 id=9 0 + chr1 10194 10200 id=10 0 +

As for what you are doing wrong in:

awk 'OFS="\t" split ($0, a, ";"){print a[1],$5,$6}'

The syntax of awk is condition{block} and setting the value of OFS and splitting is not a conditional. They are statements that should be inside the block.
However you really don't need to set the value of OFS on every line so it should be initialized only once. You can do this using the -v option, in the BEGIN block or after the script.

Valid alternatives:

$ awk -v OFS='\t' '{split($0,a,";");print a[1],$5,$6}' file $ awk 'BEGIN{OFS="\t"}{split($0,a,";");print a[1],$5,$6}' file $ awk '{split ($0,a,";");print a[1],$5,$6}' OFS='\t' file

Thank you, that does the job. Any idea what happens in my code to produce the number 4?
it's the return value from the split. you wrote the awk argument in improper format. All your actions should be inside {..}, I just changed your awk like this awk 'OFS="\t" {split ($0, a, ";");print a[1],$5,$6}' notice the { moved before split, and it worked properly
Thank you for the explanation, that was very helpful. However, I guess this is not quite the way to do it after all, as this only tab-seperates the last fields..

Sidharth C. Nadhan · Accepted Answer · 2013-05-14 09:19:32Z

1

Try this :

awk -F\; '{print $1,$4}' test.bed

answered May 14, 2013 at 9:19

Sidharth C. Nadhan

2,2832 gold badges18 silver badges18 bronze badges

2 Comments

Chris Seymour Over a year ago

This won't allow the output to be separated as required.

user53416 Over a year ago

And this works as well - but i guess I will have to specify output if input isn't tab seperated.

Collectives™ on Stack Overflow

How to split awk field correctly

2 Answers 2

3 Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Related