Skip to main content
6 events
when toggle format what by license comment
Dec 23, 2017 at 12:39 history edited Chris Davies CC BY-SA 3.0
added 18 characters in body
Dec 22, 2017 at 23:17 comment added Chris Davies @Borna that sounds an interesting question in its own right. When you've asked it I'd appreciate a ping back here with the reference and I'll take a look
Dec 22, 2017 at 20:42 comment added Boy That is exactly what I was looking for, thank you sir! One question, I was wondering how efficient would it be to create n files in a single directory (under Linux), where each file name is a row from the 'non-unique-lines' file (lets say no illegal chars for the file name), and thus eliminating duplicate rows.
Dec 21, 2017 at 19:35 comment added Chris Davies @Borna why would you want a hash table when merging multiple pre-sorted files? These external merge-sort algorithms have been around since the days of magnetic tape - at least 50 years ago.
Dec 21, 2017 at 19:15 comment added Boy How to merge? To be able to merge in reasonable time, we need some lookup logic, e.g. hash table, but then we again face the same problem -> not enough memory to store huge hash table.
Mar 19, 2015 at 13:54 history answered Chris Davies CC BY-SA 3.0