Skip to main content
6 events
when toggle format what by license comment
Sep 13, 2011 at 9:07 comment added MSalters RAM scales well because you can use more machines. I'd change the approach a litte: Use a single hash for partitiong and fast checks, because that means every machine can cache only the corresponding part of the existing rows. Secondly, if you have N machines, divide the work up in 10*N chunks based on hash_value % (10*N). They're unlikely to be the same size, so every worker machine picks up one chunk when it's done with the last. This means you don't have to wait for the last machine to finish that one huge chunk.
Sep 12, 2011 at 22:09 comment added NoChance This is a good idea, the only problem may be the RAM requirement. I also think that it would be very fast.
Sep 12, 2011 at 20:46 history edited dagnelies CC BY-SA 3.0
edited body
Sep 12, 2011 at 20:10 history edited dagnelies CC BY-SA 3.0
added 248 characters in body
Sep 12, 2011 at 19:56 comment added yati sagade delightful :) Thanks. I think before accepting this I'll wait for a few more approaches :)
Sep 12, 2011 at 19:45 history answered dagnelies CC BY-SA 3.0