I have 1000000 4-20 kb files in a dir (You can generate similar files like this: seq 10000 | gzip > a; seq 1000000 | parallel --bar 'head -c{=$_=int(rand()*16)+4=}k a > {}')
. I need to copy that dir. But it seems I have to do a seek for each files so this takes quite a while.
Is there a way in which I can speed this up?
I am currently thinking that if I could get the disk blocks that these files occupy, I could sort those, merge the blocks that were close (given that sequential read is often faster than seeking) and read these blocks, so that they were in RAM cache (I have 32 GB RAM) before doing the copy.
But for that to work I need a way to identify which blocks the files are on.
I am using EXT4 on a magnetic device (i.e. not SSD).
Edit:
This ought to work but it does not:
ls | parallel -IOO --pipe "sudo parallel -j100 hdparm --fibmap {}'|tail -n +5'" | sort -nk 2 | perl -ane 'if($u+10000 < $F[1]) { print "$l ",($u-$l),"\n"; $l=$F[1] } $u=$F[2]' | sudo parallel --colsep ' ' dd if=/dev/sda1 skip={1} bs=512 count={2} '| cat >/dev/null' When testing it on a big file it does not cache the file.
Edit2:
Here are some benchmarks. Cache was flushed (echo 3 >/proc/sys/vm/drop_caches) between each run. Measurements done with iostats -dkx 5.
rsync -Hav foo/ bar/: 1800 KB/s cp -a foo/ bar/: 3600 KB/s cat sort-by-inode | parallel -j1 -X cp foo/{} bar/: 5000 KB/s cat sort-by-inode | shuf | parallel -j1 -X cp foo/{} bar/: 3000 KB/s cat sort-by-inode | shuf | parallel -j10 -X cp foo/{} bar/: 7000 KB/s cat sort-by-inode | parallel -j10 -X cp foo/{} bar/: 8000 KB/s cat sort-by-inode | parallel -j100 -X cp foo/{} bar/: 9000 KB/s cat sort-by-inode | parallel -j500 -X cp foo/{} bar/: 10000 KB/s So what can we learn from that?
It seems sorting by inode is a good idea. But it seems parallelizing multiple cp boosts performance even further. It is worth stressing that the source foo/ is a magnetic disk, so this attacks the myth that parallelizing I/O to a single spindle will not speed up I/O: Parallelizing clearly and consistently speeds up the copying here.
cp -r /mnt/dir1 /mnt/dirdestor something likecp /mnt/dir1/* /mnt/dirdest?