3

I am relatively new to awk so I have a simple question about doing division and print the results in a new column. For example:

head data 1 13273 . G C 563 5 . 25 128 1 202259 . G T 675 8 . 12 130 1 598934 . C C 756 9 . 17 231 1 634112 . T C 125 1 . 32 89 1 779762 . G A 675 5 . 28 187 

I would like to divide column 9 by column 10 and print the results in a new column 11, preferably sort the new results from high to low. For example:

1 634112 . T C 125 1 . 32 89 0.360 1 13273 . G C 563 5 . 25 128 0.195 1 779762 . G A 675 5 . 28 187 0.150 1 202259 . G T 675 8 . 12 130 0.092 1 598934 . C C 756 9 . 17 231 0.074 

I only know how to do it in R, but I wanted to learn how we can do it in awk. Thanks!

2 Answers 2

6

Awk is quite expressive with respect to the first requirement. If you want a column 11, you can just invent it and set it equal to the result of dividing column 9 by column 10.

It's possible to do the sort in awk, but it's a bit of a pain so easier just to pipe to sort. The column command makes it prettier, nothing more than that.

$ awk '{$11 = $9 / $10}1' file | sort -nr -k 11 | column -t 1 634112 . T C 125 1 . 32 89 0.359551 1 13273 . G C 563 5 . 25 128 0.195312 1 779762 . G A 675 5 . 28 187 0.149733 1 202259 . G T 675 8 . 12 130 0.0923077 1 598934 . C C 756 9 . 17 231 0.0735931 

If your output needs to be tab separated, you can set the OFS variable (and forget about the column command):

$ awk -v OFS='\t' '{$11 = $9 / $10}1' file | sort -nr -k 11 1 634112 . T C 125 1 . 32 89 0.359551 1 13273 . G C 563 5 . 25 128 0.195312 1 779762 . G A 675 5 . 28 187 0.149733 1 202259 . G T 675 8 . 12 130 0.0923077 1 598934 . C C 756 9 . 17 231 0.0735931 

Finally, you can use sprintf to format that last column as in your sample output:

$ awk -v OFS='\t' '{$11 = sprintf("%.3f", $9 / $10)}1' file | sort -nr -k 11 1 634112 . T C 125 1 . 32 89 0.360 1 13273 . G C 563 5 . 25 128 0.195 1 779762 . G A 675 5 . 28 187 0.150 1 202259 . G T 675 8 . 12 130 0.092 1 598934 . C C 756 9 . 17 231 0.074 

UPDATE:

As Ed Morton shows in his answer, the ternary operator ?: can be used to protect against dividing by zero. Here I've put "UND" in column 11 to indicate "undefined", but of course you can just leave it blank or put some different value.

$ awk -v OFS='\t' '{$11 = ($10 != 0) ? sprintf("%.3f", $9 / $10) : "UND"}1' file | sort -nr -k 11 1 634112 . T C 125 1 . 32 89 0.360 1 13273 . G C 563 5 . 25 128 0.195 1 779762 . G A 675 5 . 28 187 0.150 1 202259 . G T 675 8 . 12 130 0.092 1 598934 . C C 756 9 . 17 0 UND 

At some point you might decide that the awk program is getting complicated enough that it's better off in its own file with an emphasis more on readability than compactness.

$ cat div.awk file BEGIN { OFS="\t"} { if ($10 != 0) { quotient = $9 / $10 $11 = sprintf("%.3f", quotient) } else { $11 = "UND" } print } $ awk -f div.awk file | sort -nr -k 11 1 634112 . T C 125 1 . 32 89 0.360 1 13273 . G C 563 5 . 25 128 0.195 1 779762 . G A 675 5 . 28 187 0.150 1 202259 . G T 675 8 . 12 130 0.092 1 598934 . C C 756 9 . 17 0 UND 
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your code! I wonder what we should do if $10 is zero? How to prevent getting an error?
You're welcome! See update for one way of checking for zero.
Perfect! Thanks so much!
2

With GNU awk for sorted_in:

$ cat tst.awk { a[NR]=$0; v[NR]=$9/$10 } END { PROCINFO["sorted_in"]="@val_num_desc" for (i in v) { print a[i] "\t" v[i] } } $ awk -f tst.awk file 1 634112 . T C 125 1 . 32 89 0.359551 1 13273 . G C 563 5 . 25 128 0.195312 1 779762 . G A 675 5 . 28 187 0.149733 1 202259 . G T 675 8 . 12 130 0.0923077 1 598934 . C C 756 9 . 17 231 0.0735931 

Change v[NR]=$9/$10 to v[NR]=($10==0 ? 0 : $9/$10) or similar to protect against divide-by-zero if $10 can be zero.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.