Timeline for How can I improve the speed of scanning multiple directories recursively at the same?
Current License: CC BY-SA 4.0
16 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| May 27, 2020 at 21:00 | history | tweeted | twitter.com/StackSoftEng/status/1265749654647472130 | ||
| Apr 12, 2020 at 17:51 | comment | added | candied_orange | On a physical hard drive seek time is not only a function of where the file is physically located, but also where the head was before you asked for it. Keep in mind you're likely not the only thing moving the head. To see how much this is impacting you set up a ram disk and test against that. Should stabilize your variance (and be orders of magnitude faster). Doesn't solve the problem but it makes clear where it lies. | |
| Apr 12, 2020 at 9:52 | comment | added | Doc Brown | @tera_789: the CPU usage you observe is a typical characteristic of an I/O bound problem, not very astonishing. | |
| Apr 12, 2020 at 9:36 | comment | added | tera_789 | @DocBrown The thing is that CPU usage never goes really high when I run these scans (it is mostly 20-30%), sometimes even less. | |
| Apr 12, 2020 at 9:17 | answer | added | gnasher729 | timeline score: 0 | |
| Apr 12, 2020 at 9:16 | comment | added | tera_789 | @DocBrown yeah I see that...it fluctuates a lot...hard to make a decision thus. At this point, I am starting to think that either network or storage device's OS is playing a big role here...sometimes it is ThreadPoolExecutor, which is faster in most tests, and, sometimes it is ProcessPoolExecutor... | |
| Apr 12, 2020 at 9:11 | comment | added | Doc Brown | @tera_789: the test results don't seem to support what you wrote in your question - for 2 of them, ProcessPoolExecutor is faster, but for 3 of them, ThreadPoolExecutor. | |
| Apr 12, 2020 at 8:21 | history | edited | tera_789 | CC BY-SA 4.0 | added test results |
| Apr 12, 2020 at 8:17 | comment | added | tera_789 | @DocBrown I added test results | |
| Apr 12, 2020 at 8:16 | history | edited | tera_789 | CC BY-SA 4.0 | added test results |
| Apr 12, 2020 at 7:43 | comment | added | Euphoric | My gut instinct in this situation is that you cannot do any optimization unless you work on OS or even HW layer. Maybe try to find OS APIs that could be used to call that would return the size of the whole directory, instead of doing it yourself? This would give OS way to use it's own optimizations. | |
| Apr 12, 2020 at 7:21 | history | edited | Doc Brown | CC BY-SA 4.0 | Fixed wrong usage of the term parallelism |
| Apr 12, 2020 at 7:01 | answer | added | Tfry | timeline score: 4 | |
| Apr 12, 2020 at 6:57 | answer | added | Netch | timeline score: 2 | |
| Apr 11, 2020 at 22:30 | review | First posts | |||
| Apr 12, 2020 at 11:25 | |||||
| Apr 11, 2020 at 22:29 | history | asked | tera_789 | CC BY-SA 4.0 |