Skip to main content
2 of 18
added 19 characters in body
user avatar
user avatar

Filter a large file quickly

The challenge is to filter a large file quickly.

  • Input: Each line has three space separated positive integers.
  • Output: All input lines A B, Tthat satisfy either of the following criterion.
  1. There exists another input line C, D, U where D = A and 0 <= T - U < 100.
  2. There exists another input line C, D, U where B = C and 0 <= U - T < 100.

To make a test file use the following python script which will also be used for testing. It will make a 1.3G file. You can of course reduce nolines for testing.

import random nolines = 50,000,000 for i in xrange(nolines): print random.randint(0,nolines-1), random.randint(0,nolines-1), random.randint(0,nolines-1) 

Rules. The fastest code when tested on an input file I make using the above script on my computer wins. The deadline is one week from today.

My Machine The timings will be run on my machine. This is a standard 8GB RAM ubuntu install on an AMD FX-8350 Eight-Core Processor. This also means I need to be able to run your code.

Some relevant timing information

real 0m18.525s user 0m18.214s sys 0m0.269s time sort -n test.file > /dev/null real 1m19.986s user 2m1.918s sys 0m5.895s 
user9206