I have a txt file contains a total of 10177 columns and a total of approximately 450,000 rows. The information is separated by tabs. I am trying to trim the file down using awk so that it only prints the 1-3, 5th, and every 14th column after the fifth one.
My file has a format that looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... 10177 A B C D E F G H I J K L M N O P Q R S T ... X Y X Y X Y X Y X Y X Y X Y X Y X Y X Y ... I am hoping to generate an output txt file (also separated with tab) that contains:
1 2 3 5 18 ... A B C E R ... X Y X X Y ... The current awk code I have looks like (I am using cygwin to use the code):
$ awk -F"\t" '{OFS="\t"} { for(i=5;i<10177;i+=14) printf ($i) }' test2.txt > test3.txt But the result I am getting shows something like:
123518...ABCER...XYXXY... When opened with excel program, the results are all mashed into 1 single cell.
In addition, when I try to include code
for (i=0;i<=3;i++) printf "%s ",$i in the awk to get the first 3 columns, it just prints out the original input document together with the mashed result. I am not familiar with awk, so I am not sure what causes this issue.