Why is fio seq_writes so much faster than dd?

Question

I have a zfs server, where I ran a couple of dumb tests just for understanding, and it puzzles me.

Context:- FreeBSD 11.2, ZFS with Compression enabled, SAS HDDs, RAIDz2, 768GB of memory.

Both commands were run directly on the FreeBSD server.

# time dd if=/dev/random of=./test_file bs=128k count=131072 131072+0 records in 131072+0 records out 17179869184 bytes transferred in 135.191596 secs (127077937 bytes/sec) 0.047u 134.700s 2:15.19 99.6% 30+172k 4+131072io 0pf+0w # #The result file size: # du -sh test_file 16G test_file

This shows that I was able to a 16GiB file with random data in 135 secs with a throughput of approx. 117 MiB/s.

Now, I try to use fio,

# fio --name=seqwrite --rw=write --bs=128k --numjobs=1 --size=16G --runtime=120 --iodepth=1 --group_reporting seqwrite: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=1 fio-3.6 Starting 1 process seqwrite: Laying out IO file (1 file / 16384MiB) Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=2482MiB/s][r=0,w=19.9k IOPS][eta 00m:00s] seqwrite: (groupid=0, jobs=1): err= 0: pid=58575: Wed Jul 25 09:38:06 2018 write: IOPS=19.8k, BW=2478MiB/s (2598MB/s)(16.0GiB/6612msec) clat (usec): min=28, max=2585, avg=48.03, stdev=24.04 lat (usec): min=29, max=2586, avg=49.75, stdev=25.19 bw ( MiB/s): min= 2295, max= 2708, per=99.45%, avg=2464.33, stdev=124.56, samples=13 iops : min=18367, max=21664, avg=19714.08, stdev=996.47, samples=13 ---------- Trimmed for brevity ------------- Run status group 0 (all jobs): WRITE: bw=2478MiB/s (2598MB/s), 2478MiB/s-2478MiB/s (2598MB/s-2598MB/s), io=16.0GiB (17.2GB), run=6612-6612msec

Now, I hit 2478 MiB/s of throughput. while using the same 16 GiB file with random data.

Why is there such a big difference? My understanding is that dd command must have used create call to create a file, then issue open, and write calls to write the random data into the open file. Finally closethe file. I chose block size of 128 K to match with ZFS default record size.

The fio test should be measuring just the write calls, but everything else, the same. Why is there so much difference in throughput?

To confuse me even further, if I asked fio to create a file with 50% compressibility, the throughput drops to 847 MiB/s. I understand there CPU work involved in compression causing a throughput drop, but I was hoping that it's impact would be neutralised by having near half the amount of data to write. Any ideas why the impact is this high?

Command used to run fio with 50% compressibility:

 fio --name=seqwrite --rw=write --bs=128k --numjobs=1 --size=16G --runtime=60 --iodepth=1 --buffer_compress_percentage=50 --buffer_pattern=0xdeadbeef --group_reporting

In linux fallocate is also so much faster than dd. Do not know about fio, interesting question. — Rui F Ribeiro
– Rui F Ribeiro, Commented Jul 25, 2018 at 8:36
What throughput do you get with --buffer_compress_percentage=0 ? — Mark Plotnick
– Mark Plotnick, Commented Jul 25, 2018 at 8:38
Ok, this is weird. I ran the same fio test with --buffer_compress_percentage=0and now I get a throughput of 139 MiB/s. I expected the result to be similar to not asking for buffer_compress_percentage at all(2478 MiB/s), but the results differ wildly. — Aravindh Sathish
– Aravindh Sathish, Commented Jul 25, 2018 at 9:30
I'd expect --buffer_compress_percentage=0 to use 100% random data, so the fact that the throughput is the about same as dd if=/dev/random is a good thing. Can you add the contents of your fio job file to your question? It looks like fio is not using /dev/random as its source in your first run, but is instead using highly compressible data, — Mark Plotnick
– Mark Plotnick, Commented Jul 25, 2018 at 16:20

Anon · Accepted Answer · 2018-11-04 08:27:23Z

I'm going to reframe your question to highlight some of the context:

Why is
fio --name=seqwrite --rw=write --bs=128k --numjobs=1 --size=16G --runtime=120 --iodepth=1 --group_reporting 
faster than
time dd if=/dev/random of=./test_file bs=128k count=131072 
on a FreeBSD 11.2 system with 768GB RAM, SAS HDDs and ZFS configured as a RAIDZ2 with compression enabled?

A major difference is that fio is pre-making the file before doing the timing tests against it:

seqwrite: Laying out IO file (1 file / 16384MiB)

whereas dd is likely doing file extending writes (which will cause metadata updates). Also you have so much RAM (768G) and you're writing so little data in comparison to it (16G) there's a strong chance your writes can be held in RAM (and not actually written to disk until much later). This is likely in the fio case where the file has been premade and very little file metadata needs to be modified per I/O. You can at least tell fio not to say it's done until all written data to be written back from the kernel at the end of the job by using end_fsync=1.

(NB: There's a subtle hint that I/O is being buffered when you see completion latencies much lower than what you know your disk can do:

clat (usec): min=28, max=2585, avg=48.03, stdev=24.04

Can your spinning disk really complete an I/O in 28 microseconds? If not it likely got buffered somewhere)

Finally, fio defaults to reusing the same the pattern in subsequent blocks. Since there's compression going on this could further improve your fio throughput (but this will be dependent on things like the ZFS recordsize). To check this, tell fio to make its buffers incompressible (which in turn turns refill_buffers on) and see if the throughput drops (which it did in your case).

TLDR; the fio and dd commands you gave are not testing the same thing. You need to be aware of things like whether your files already exist at the correct size, how compressible the data you are writing is and whether you are accounting for things like kernel buffering by writing too little data and not checking whether it's all been written back to disk.

Stack Exchange Network

Why is fio seq_writes so much faster than dd?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Why is fio seq_writes so much faster than dd?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions