Timeline for Filter a large file quickly
Current License: CC BY-SA 3.0
15 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| May 14, 2014 at 12:38 | comment | added | NPSF3000 | @Lembik Ignore those warnings, they are fine. I understand it needs to run on your system (and it should, thanks to mono) I am just too busy to debug it myself (flying to Sydney in 10 hours, for a busy few days) | |
| May 14, 2014 at 8:43 | comment | added | user9206 | The warnings are NPSF3000.cs(134,24): warning CS0219: The variable tasks' is assigned but its value is never used NPSF3000.cs(61,30): warning CS0219: The variable solved' is assigned but its value is never used. It is a shame I can't run your code yet. The question does state that my system is ubuntu and I need to be able to run the code to test it. | |
| May 13, 2014 at 22:32 | comment | added | NPSF3000 | @Lembik, I don't run linux and have little time ATM to debug :( What are the warnings? Note: I don't create any threads, I simply queue work for the threadpool to handle for me. | |
| May 13, 2014 at 18:33 | comment | added | user9206 | It looks like you are creating a huge number of threads. Do you mean to do that? | |
| May 13, 2014 at 18:14 | comment | added | user9206 | Can you give me instructions for running it in linux? I tried mcs NPSF3000.cs which gives a couple of warnings and then mono NPSF3000.exe ~/largefile.file which gets as far as Started: Parallel Sort A but then does nothing it seems. | |
| May 13, 2014 at 12:41 | history | edited | NPSF3000 | CC BY-SA 3.0 | new version of code, with args |
| May 12, 2014 at 23:25 | comment | added | NPSF3000 | Comparer fixed, results produced. Computation accounts for only about 5 seconds of my thirty (input 12, sorts 5 each) so I'm thinking about my next line of attack. IO is processing at ~100MBps so speedups might be limited. | |
| May 12, 2014 at 23:13 | history | edited | NPSF3000 | CC BY-SA 3.0 | added 174 characters in body |
| May 12, 2014 at 14:47 | comment | added | NPSF3000 | @James_pic thanks for the suggests, I'll chase them up if I have time. I just cut 40 odd seconds out of my IO so I can focus again on the sort and computation. | |
| May 12, 2014 at 14:43 | history | edited | NPSF3000 | CC BY-SA 3.0 | deleted 774 characters in body |
| May 12, 2014 at 14:38 | comment | added | James_pic | Another fun option, since you know that the values of B will be uniformly distributed in a specific range, would be to swap binary search for interpolation search, which reduces search time from O(log(n)) to O(log(log(n)). | |
| May 12, 2014 at 14:35 | comment | added | James_pic | More generally, if you're sorting by A and B, there's a faster algorithm than iterating on A and binary searching on B which is O(n log(n)) (and is effectively a poor-man's hash table). You can instead merge-join the two lists, which is O(n). | |
| May 12, 2014 at 14:32 | comment | added | James_pic | You should get around 200 results, more or less irrespective of the size of your input data. I suspect your problem is related to the way you're using binary search, in lines 98-102 - I suspect you're assuming that x.A will come from sortedA, and x.B will come from sortedB, whereas in fact both will come from sortedB, and this Comparer will produce nonsense results. | |
| May 12, 2014 at 14:27 | history | edited | NPSF3000 | CC BY-SA 3.0 | added 1556 characters in body |
| May 12, 2014 at 13:58 | history | answered | NPSF3000 | CC BY-SA 3.0 |