Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

9
  • +1 for stating the bleeding obvious I should've seen when writing my answer. D'oh! :) Commented Jun 15, 2009 at 14:20
  • True; I was doing it without a temporary file, but it might be a little more efficient with one (no LinkedHashSet necessary). But I'd venture a guess that CPU isn't going to be the bottleneck anyway. Commented Jun 15, 2009 at 14:21
  • Er, my comment was directed at Workshop Alex, not gustafc. Commented Jun 15, 2009 at 14:22
  • 1
    Of course, instead of using an output file, you could output to an unsorted string list, in memory. Then, when you're done adding the input without duplicates, write the string list over the old input file. It does mean you'll be using twice as much memory than with other solutions, but it's still extremely fast. Commented Jun 16, 2009 at 9:39
  • 1
    That's because it stores the strings twice: once in the hash table and once in the string list. (Then again, chances are that both the hashset and string list only store references to strings, in which case it won't eat up that much.) Commented Jun 16, 2009 at 21:15