Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

5
  • I like not having to know the first line idea since it makes it a generalized script for your toolbox. Commented Jan 27, 2016 at 21:53
  • 1
    that awk method creates an empty/false array entry per distinct line; for 4M lines if all different (not clear from Q) and fairly short (appears so) this is probably okay, but if there are much more or longer lines this could thrash or die. !($0 in a) tests without creating and avoids this, or awk can do the same logic as you have for perl: '$0!=x; NR==1{x=$0}' or if the header line can be empty 'NR==1{x=$0;print} $0!=x' Commented Jan 28, 2016 at 8:57
  • 1
    @dave_thompson_085 where is an array per line created? You mean !a[$0]? Why would that create an entry in a? Commented Jan 28, 2016 at 10:27
  • 1
    Because that's how awk works; see gnu.org/software/gawk/manual/html_node/… especially the "NOTE". Commented Jan 29, 2016 at 2:53
  • 1
    @dave_thompson_085 well I'll be damned! Thanks, I was not aware of that. Fixed now. Commented Jan 29, 2016 at 9:57