Timeline for deduplication of lines in a large file

6 events

when toggle format	what		by	license	comment
Dec 23, 2017 at 12:39	history	edited	Chris Davies	CC BY-SA 3.0	added 18 characters in body
Dec 22, 2017 at 23:17	comment	added	Chris Davies		@Borna that sounds an interesting question in its own right. When you've asked it I'd appreciate a ping back here with the reference and I'll take a look
Dec 22, 2017 at 20:42	comment	added	Boy		That is exactly what I was looking for, thank you sir! One question, I was wondering how efficient would it be to create n files in a single directory (under Linux), where each file name is a row from the 'non-unique-lines' file (lets say no illegal chars for the file name), and thus eliminating duplicate rows.
Dec 21, 2017 at 19:35	comment	added	Chris Davies		@Borna why would you want a hash table when merging multiple pre-sorted files? These external merge-sort algorithms have been around since the days of magnetic tape - at least 50 years ago.
Dec 21, 2017 at 19:15	comment	added	Boy		How to merge? To be able to merge in reasonable time, we need some lookup logic, e.g. hash table, but then we again face the same problem -> not enough memory to store huge hash table.
Mar 19, 2015 at 13:54	history	answered	Chris Davies	CC BY-SA 3.0