Timeline for Filter a large file quickly
Current License: CC BY-SA 3.0
15 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Sep 5, 2014 at 16:25 | comment | added | Keith Randall | @Lembik: the code as is only sorts B*R=27 bit numbers. You now have 29 bit numbers - you need one more pass (R++) or one more bit per pass (B++). B++ is probably easier, R is hardcoded in a few unrolled loops. | |
| Sep 5, 2014 at 12:03 | comment | added | user9206 | I was just going back to this code and changed the data creation to be nolines = 5000000 and then print random.randint(0,100*nolines-1), random.randint(0,100*nolines-1), random.randint(0,100*nolines-1) . That is I multiplied the range by 100. Your code now runs very slowly. Do you know why by any chance? | |
| May 15, 2014 at 4:34 | comment | added | primo | Absurdly fast. I couldn't even get parse time under 30s! | |
| May 14, 2014 at 21:14 | comment | added | wolfhammer | For some reason a for loop was even faster. for(temp=0;*p != ' ';temp = 10 * temp + (*(p++) - '0')) {} lines[i].a = temp; p++; | |
| May 14, 2014 at 21:02 | comment | added | wolfhammer | I noticed faster parsing times when doing the integer calculation in one line. Try:10 * temp + (*(p++) - '0'); | |
| May 14, 2014 at 20:19 | comment | added | user9206 | Your entry was just as fast as James_pic's but I couldn't award the bounty to both so I gave it to the first one. I hope that's OK. I am about to make a follow up question which I hope you will find fun too. | |
| May 14, 2014 at 19:46 | vote | accept | CommunityBot | moved from User.Id=9206 by developer User.Id=3572 | |
| May 14, 2014 at 9:53 | comment | added | user9206 | @Geobits The top two submissions basically take the same amount of time. You could submit yours and we would have a three way race :) (Also, I plan to post a follow up challenge soon on the same topic.) | |
| May 13, 2014 at 21:51 | comment | added | Geobits | Well damn. I literally just created filter.c to do the same thing, came to the question and found this. +1 | |
| May 13, 2014 at 17:18 | comment | added | Keith Randall | Radix sort works well in a cache because there is one stream of reads and N streams of writes (in my code, N=512). As long as your cache has N+1 cache lines, everything can remain in cache. | |
| May 13, 2014 at 9:42 | comment | added | James_pic | YES! I love it. I had a feeling that cache locality would make joining on T faster, but I always figured the sort stage would offset any gains. Using Radix sort pretty much eliminates that. | |
| May 13, 2014 at 8:19 | comment | added | user9206 | This is equal first. I am surprised radix sort is so fast as normally you think of it as having terrible cache performance. I think I will need to test in single user mode to tell them apart as the timings are not exactly the same on each run even with the same test file. | |
| May 13, 2014 at 4:51 | history | edited | Keith Randall | CC BY-SA 3.0 | fix spacing |
| May 13, 2014 at 4:45 | history | edited | Keith Randall | CC BY-SA 3.0 | use a bcount array to quickly eliminate most window searches |
| May 13, 2014 at 4:08 | history | answered | Keith Randall | CC BY-SA 3.0 |