The challenge is to filter a large file quickly.
- Input: Each line has three space separated positive integers.
- Output: All input lines
AB,Tthat satisfy either of the following criterion.
- There exists another input line
C,D,UwhereD = Aand0 <= T - U < 100. - There exists another input line
C,D,UwhereB = Cand0 <= U - T < 100.
To make a test file use the following python script which will also be used for testing. It will make a 1.3G file. You can of course reduce nolines for testing.
import random nolines = 50,000,000 for i in xrange(nolines): print random.randint(0,nolines-1), random.randint(0,nolines-1), random.randint(0,nolines-1) Rules. The fastest code when tested on an input file I make using the above script on my computer wins. The deadline is one week from today.
My Machine The timings will be run on my machine. This is a standard 8GB RAM ubuntu install on an AMD FX-8350 Eight-Core Processor. This also means I need to be able to run your code.
Some relevant timing information
real 0m18.525s user 0m18.214s sys 0m0.269s time sort -n test.file > /dev/null real 1m19.986s user 2m1.918s sys 0m5.895s