Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

4
  • 1
    Why does it need the sort?? (a) the file is already sorted on the key field 2, and (b) it sorts on the data field 4, and (c) it doesn't actually need to be ordered anyway, being as it range-checks every line independently. Commented Jan 3, 2020 at 14:21
  • 1
    @Paul_Pedant it needs to be sorted for this solution since this particular awk script relies on the first $4 it sees in the range to be the min value and the last $4 to be the max. Only the sort is comparing $4 values with each other. Commented Jan 3, 2020 at 14:34
  • @edmorton. Conceded, I didn't read the complete post. However, ACGT data is a flag for DNA sequencers, so I would expect huge file sizes. I find a sort|awk solution is generally 10 times slower than a plain awk, for large data. It's that N log N vs. plain N thing. Commented Jan 3, 2020 at 16:36
  • @Paul_Pedant I'm not recommending this approach, just explaining why the poster is sorting in response to your question. Commented Jan 3, 2020 at 17:07