Timeline for Speed up copying 1000000 small files
Current License: CC BY-SA 4.0
23 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Dec 21, 2020 at 17:45 | history | edited | Ole Tange | CC BY-SA 4.0 | added 143 characters in body |
| Dec 21, 2020 at 2:47 | comment | added | Rodrigo | Executing. It would be great if you added it to the question, and even more if you explained exactly how does it work. | |
| Dec 21, 2020 at 0:39 | comment | added | Ole Tange | @Rodrigo The files are just text files sized 4-20 kbytes: seq 10000 > a; seq 1000000 | parallel --bar 'head -c{=$_=int(rand()*16)+4=}k a > {}'. | |
| Dec 21, 2020 at 0:29 | comment | added | Rodrigo | Maybe you could share those million small files? Or suggest a script to create them? And from what I see, you're copying them to the same disk? Anyway, different benchmarks (same/different drives) would be better. | |
| Dec 20, 2020 at 23:57 | comment | added | Ole Tange | @Rodrigo Feel free to post an answer with your measurements. | |
| Dec 20, 2020 at 23:56 | comment | added | Rodrigo | @OleTange It depends where you're copying your files to. If to a USB stick, then I think it's faster. | |
| Dec 20, 2020 at 23:51 | comment | added | Ole Tange | @Rodrigo Is that faster? You need to include unpacking the zip-file in your timings. | |
| Dec 20, 2020 at 16:59 | comment | added | Rodrigo | Just zipping the whole directory, and then copying the zip file is not an option? | |
| Sep 15, 2017 at 18:34 | comment | added | Ole Tange | @the8472 Can you update your benchmark section with a test where you do the actual copying and compare to the fastest solution shown here? (i.e. read and write to same single spindle) | |
| Sep 15, 2017 at 15:47 | comment | added | the8472 | I wrote fastar that combines the fiemap-optimized traversal with multi-file readaheads for additional speedup when dealing with many small files. | |
| Feb 19, 2017 at 11:09 | comment | added | nh2 | I wrote a program that orders files by their extent number (more likely to be the occurrence on disk) here: github.com/nh2/diskorder | |
| Apr 13, 2014 at 21:10 | vote | accept | Ole Tange | ||
| Apr 13, 2014 at 21:09 | history | edited | Ole Tange | CC BY-SA 3.0 | added 1026 characters in body |
| Apr 13, 2014 at 20:59 | answer | added | Graeme | timeline score: 3 | |
| Apr 13, 2014 at 15:21 | answer | added | mikeserv | timeline score: 6 | |
| Apr 13, 2014 at 13:20 | comment | added | maxschlepzig | btw, I don't think it helps if you execute hard disk accesses in parallel when you want to minimize disk seeks. | |
| Apr 13, 2014 at 13:11 | answer | added | maxschlepzig | timeline score: 11 | |
| Apr 13, 2014 at 12:51 | history | edited | Ole Tange | CC BY-SA 3.0 | deleted 10 characters in body |
| Apr 13, 2014 at 12:46 | history | edited | Ole Tange | CC BY-SA 3.0 | added 405 characters in body |
| Apr 13, 2014 at 11:44 | comment | added | Ole Tange | Only files but not the same filesystem: cp -r /mnt/dir1 /mnt2/dirdest | |
| Apr 13, 2014 at 10:26 | comment | added | maxschlepzig | For exact command you are using for copying? Something like cp -r /mnt/dir1 /mnt/dirdest or something like cp /mnt/dir1/* /mnt/dirdest? | |
| Apr 13, 2014 at 10:18 | comment | added | Joseph R. | Does your directory contain only files? Is your target location on the same filesystem? | |
| Apr 13, 2014 at 10:13 | history | asked | Ole Tange | CC BY-SA 3.0 |