devconf.cz 2014 QEMU Disk IO Which performs Better: Native or threads? Pradeep Kumar Surisetty Red Hat, Inc. devconf.cz, February 2016
Outline devconf.cz 2016 ● KVM IO Architecture ● Storage transport choices in KVM ● Virtio-blk Storage Configurations ● Performance Benchmark tools ● Challenges ● Performance Results with Native & Threads ● Limitations ● Future Work
KVM I/O Architecture HARDWARE cpu0 …. cpuM HARDWARE cpu0 …. cpuM HARDWARE …. Applications File System & Block Drivers vcpu0 … vcpuN … vcpuN iothread cpu0 cpuM Applications File System & Block Drivers KVM GUEST KVM Guest’s Kernel vcpu0 iothread Hardware Emulation (QEMU) Generates I/O requests to host on guest’s behalf & handle events Notes: Each guest CPU has a dedicated vcpu thread that uses kvm.ko module to execute guest code There is an I/O thread that runs a select(2) loop to handle events devconf.cz 2016 KVM (kvm.ko) File Systems and Block Devices Physical Drivers
Storage transport choices in KVM ● Full virtualization : IDE, SATA, SCSI ● Good guest compatibility ● Lots of trap-and-emulate, bad performance ● Para virtualization: virtio-blk, virtio-scsi ● Efficient guest ↔ host communication through virtio ring buffer (virtqueue) ● Good performance ● Provide more virtualization friendly interface, higher performance. ● In AIO case, io_submit() is under the global mutex devconf.cz 2016
Storage transport choices in KVM ● Device assignment (Passthrough) ● Pass hardware to guest, high-end usage, high performance ● Limited Number of PCI Devices ● Hard for Live Migration devconf.cz 2016
Full virtualization Para-virtualization Storage transport choices in KVM devconf.cz 2016
Virtio PCI Controller Virtio Device vring Guest Qemu Virtio pci controller Virtio Device Kick Ring buffer with para virtualization
Virtio-blk-data-plane: ● Accelerated data path for para-virtualized block I/O driver ● Threads are defined by -object othread,iothread=<id> and the user can set up arbitrary device->iothread mappings (multiple devices can share an iothread) ● No need to acquire big QEMU lock KVM Guest Host Kernel QEMU Event Loop Virtio-bl data-pla KVM Guest Host Kernel QEMU Event Loop vityio-blk- data-plane thread(s) Linux AIO irqfd devconf.cz 2016
Virtio-blk Storage Configurations KVM Applications Guest LVM Volume on Virtual Devices Host Server KVM Guest LVM Volume on Virtual Devices Physical Storage Applications Direct I/O w/ Para-Virtualized Drivers Block Devices /dev/sda, /dev/sdb,… Device-Backed Virtual Storage Virtual Devices /dev/vda, /dev/vdb,… KVM Guest Applications LVM File Volume Host Server KVM Guest LVM Volume Physical Storage File Applications Para- virtualized Drivers, Direct I/O RAW or QCOW2 File-Backed Virtual Storage devconf.cz 2016
Openstack: Libvirt: AIO mode for disk devices 1) Asynchronous IO (AIO=Native) Using io_submit calls 2) Synchronous (AIO=Threads) pread64, pwrite64 calls Default Choice in Openstack is aio=threads* Ref: https://specs.openstack.org/openstack/novaspecs/specs/mitaka/approved/libvirt-aio-mode.html * Before solving this problem devconf.cz 2016
● </disk> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native'/> <source file='/home/psuriset/xfs/vm2-native-ssd.qcow2'/> <target dev='vdb' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> ● <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='threads'/> <source file='/home/psuriset/xfs/vm2-threads-ssd.qcow2'/> <target dev='vdc' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> Example XML devconf.cz 2016
CPU Usage with aio=Native devconf.cz 2016
CPU Usage with aio=Threads devconf.cz 2016
ext4/XFS Ext4, XFS NFS SSD, HDD Qcow2, Qcow2 (With Falloc), Qcow2 (With Fallocate), Raw(With Preallocated) File, Block device Jobs: Seq Read, Seq Write,Rand Read, Rand Write, Rand Read Write Block Sizes: 4k, 16k, 64k, 256k Number of VM: 1, 16 (Concurrent) devconf.cz 2016 Multiple layers Evaluated with virtio-blk
Test Environment Hardware ● 2 x Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz ● 256 GiB memory @1866MHz ● 1 x 1 TB NVMe PCI SSD ● 1 x 500 GB HDD Software ● Host: RHEL 7.2 :3.10.0-327 ● Qemu: 2.3.0-31 + AIO Merge Patch ● VM: RHEL 7.2 devconf.cz 2016
Tools What is Pbench? pbench (perf bench) aims to: ● Provide easy access to benchmarking & performance tools on Linux systems ● Standardize the collection of telemetry and configuration Information ● Automate benchmark execution ● Output effective visualization for analysis allow for ingestion into elastic search devconf.cz 2016
Pbench Continued... Tool visualization: sar tool, total cpu consumption: devconf.cz 2016
Pbench Continued .. tool visualization: iostat tool, disk request size: devconf.cz 2016
Pbench Continued .. tool visualization: proc-interrupts tool, function call interrupts/sec: devconf.cz 2016
Pbench Continued .. tool visualization: proc-vmstat tool, numa stats: entries in /proc/vmstat which begin with “numa_” (delta/sec) devconf.cz 2016
Pbench Continued .. pbench benchmarks example: fio benchmark # pbench_fio --config=baremetal-hdd runs a default set of iterations: [read,rand-read]*[4KB, 8KB….64KB] takes 5 samples per iteration and compute avg, stddev handles start/stop/post-process of tools for each iteration other fio options: --targets=<devices or files> --ioengine=[sync, libaio, others] --test-types=[read,randread,write,randwrite,randrw] --block-sizes=[<int>,[<int>]] (in KB) devconf.cz 2016
FIO: Flexible IO Tester ● IO type Defines the io pattern issued to the file(s). We may only be reading sequentially from this file(s), or we may be writing randomly. Or even mixing reads and writes, sequentially or Randomly ● Block size In how large chunks are we issuing io? This may be a single value, or it may describe a range of block sizes. ● IO size How much data are we going to be reading/writing ● IO Engine How do we issue io? We could be memory mapping the file, we could be using regular read/write, we could be using splice, async io, syslet, or even SG (SCSI generic sg) ● IO depth If the io engine is async, how large a queuing depth do we want to maintain? ● IO Type Should we be doing buffered io, or direct/raw io? devconf.cz 2016
Guest Host Guest & Host iostat during 4k seq read with aio=native devconf.cz 2016
● Aio=native uses Linux AIO io_submit(2) for read and write requests and Request completion is signaled using eventfd. ● Virtqueue kicks are handled in the iothread. When the guest writes to the virtqueue kick hardware register the kvm.ko module signals the ioeventfd which the main loop thread is monitoring. ● Requests are collected from the virtqueue and submitted (after write request merging) either via aio=threads or aio=native. ● Request completion callbacks are invoked in the main loop thread and an interrupt is injected into the guest. . AIO Native devconf.cz 2016 Virtio PCI Controller Virtio Device vring Guest Qemu Virtio pci controller Virtio Device Kick
Challenges for Read with aio=native ● ● virtio-blk does *not* merge read requests in qemu-kvm. It only merges write requests. ● QEMU submits each 4 KB request through a separate io_submit() call. ● Qemu would submit only 1 request at a time though Multiple requests to process ● Batching method was implemented for both virtio-scsi and virtio-blk-data-plane disk
Batch Submission What is I/O batch submission ● Handle more requests in one single system call(io_submit), so calling number of the syscall of io_submit can be decrease a lot Abstracting with generic interfaces ● bdrv_io_plug( ) / bdrv_io_unplug( ) ● merged in fc73548e444ae3239f6cef44a5200b5d2c3e85d1 (virtio-blk: submit I/O as a batch) devconf.cz 2016
Performance Comparison Graphs devconf.cz 2016
Test Specifications Single VM Results Multiple VM (16) Results Disk: SSD FS: None (used LVM) Image: raw Preallocated: yes aio=threads has better performance with LVM. 4K read,randread performance is 10-15% higher. 4K write is 26% higher. Native & threads perform equally in most cases but native does better in few cases. Disk: SSD FS: EXT4 Image: raw Preallocated: yes Native performs well with randwrite, write, and randread-write. Threads 4K read is 10-15% Higher.. 4K randread is 8% higher. Both have similar results. 4K seq reads: threads 1% higher. Disk: SSD FS: XFS Image: Raw Preallocated: yes aio=threads has better performance Native & threads perform equally in most cases but native does better in few cases. Threads better in seq writes Disk: SSD FS: EXT4 Image: raw Preallocated: yes NFS : yes Native performs well with randwrite, write and randread-write. Threads do well with 4K/16K read, randread by 12% higher. Native & threads perform equally in most cases but native does better in few cases. Results devconf.cz 2016
Test Specifications Single VM Results Multiple VM (16) Results Disk: SSD FS: XFS Image: raw Preallocated: yes NFS : yes Native performs well with all tests except read & randread tests where threads perform better. Native performs well with all tests. Disk: SSD FS: EXT4 Image: qcow2 Preallocated: no Native does well with all tests. Threads outperform native <10% for read and randread. Native is better than threads in most cases. Seq reads are 10-15% higher with native. Disk: SSD FS: XFS Image: qcow2 Preallocated: no Native performs well with all tests except seq read which is 6% higher Native performs better than threads except seq write, which is 8% higher Disk: SSD FS: EXT4 Image: qcow2 Preallocated: with falloc (using qemu-img) Native is optimal for almost all tests. Threads slightly better (<10%) for seq reads. Native is optimal with randwrite, write and randread-write. Threads have slightly better performance for read and randread. Disk: SSD FS: XFS Image: qcow2 Preallocate: with falloc Native is optimal for write and randread-write. Threads better (<10%) for read and randread. Native is optimal for all tests. Threads is better for seq writes. Disk: SSD FS: EXT4 Native performs better for randwrite, write, randread-write. Threads does better for read and randread. 4K,16K read,randread is 12% higher. Native outperforms threads.
Test Specifications Single VM Results Multiple VM (16) Results Disk: SSD FS: XFS Image: qcow2 Preallocated: with fallocate Native is optimal for randwrite, write and randread Threads better (<10%) for read Native optimal for all tests. Threads optimal for randread, and 4K seq write. Disk: HDD FS: No. Used LVM Image: raw Preallocated: yes Native outperforms threads in all tests. Native outperforms threads in all tests. Disk: HDD FS: EXT4 Image: raw Preallocated: yes Native outperforms threads in all tests. Native outperforms threads in all tests. Disk: HDD FS: XFS Image: raw Preallocated: yes Native outperforms threads in all tests. Native is optimal or equal in all tests. Disk: HDD FS: EXT4 Image: qcow2 Preallocated: no Native is optimal or equal in all test cases except randread where threads is 30% higher. Native is optimal except for 4K seq reads. Disk: HDD FS: XFS Image: qcow2 Preallocated: no Native is optimal except for seq writes where threads is 30% higher. Native is optimal except for 4K seq reads. Disk: HDD FS: EXT4 Native is optimal or equal in all cases except randread where threads is 30% higher. Native is optimal or equal in all tests except 4K read.
Test Specifications Single VM Results Multiple VM (16) Results Disk: HDD FS: XFS Image: qcow2 Preallocated: with falloc (using qemu-img) Native is optimal or equal in all cases except seq write where threads is 30% higher. Native is optimal or equal in all cases except for 4K randread. Disk: HDD FS: EXT4 Image: qcow2 Preallocated: with fallocate Native is optimal or equal in all tests. Native is optimal or equal in all tests except 4K randread where threads is 15% higher. Disk: HDD FS: XFS Image: qcow2 Preallocated: with fallocate Native is optimal in all tests except for seq write where threads is 30% higher. Nativs is better. Threads has slightly better performance(<3-4%), excluding randread where threads is 30% higher. devconf.cz 2016
Performance Graphs devconf.cz 2016
1. Disk: SSD, Image: raw, Preallocated: yes, VMs: 16 FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads RandRead devconf.cz 2016
RandReadWrite FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
RandWrite devconf.cz 2016 FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads
Seq Read FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
Seq Write devconf.cz 2016 FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads
2. Disk: HDD, Image: raw, Preallocated: yes, VMs: 16 RandRead RandReadWrite devconf.cz 2016 FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads
RandWrite Seq Read Seq Write devconf.cz 2016 RandWrite FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads
3. Disk: SSD, Image: raw, Preallocated: yes, VMs: 1 RandRead RandRead Write FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
RandRead Write Seq Read FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
Seq Write FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
Rand Read Rand Read Write Rand Write 4. Disk: HDD, Image: raw, Preallocated: yes, VMs: 1 FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
Seq Read Seq Write FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
5. Disk: SSD, Image: qcow2, VMs: 16 RandRead 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
RandReadWrite RandWrite 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
Seq Read SeqWrite 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
6. Disk: HDD, Image: qcow2, VMs: 16 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
RandRead Rand Read Write Rand Write Seq Read 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
7. Disk: SSD, Image: qcow2, VMs: 1 RandRead 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
RandReadWrite RandWrite 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
Seq Read 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
Seq Write 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
8. Disk: HDD, Image: qcow2, VMs: 1 Seq Read 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
Rand Read Write Rand Write Rand Read 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
Seq Write 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
8. Disk: SSD, Image: raw, NFS: yes, VMs: 1 1.FS: EXT4, aio=native 2. FS: EXT4, aio=threads 3. FS: XFS, aio=native 4. FS: XFS, aio=threads Rand Read Rand Read Write devconf.cz 2016
Rand Write Seq Read Seq Write devconf.cz 2016
https://review.openstack.org/#/c/232 514/7/specs/mitaka/approved/libvirt- aio-mode.rst,cm devconf.cz 2016
Performance Brief ● https://access.redhat.com/articles/2147661 devconf.cz 2016
Conclusion & Limitations ● Throughput increased a lot because IO thread takes fewer CPU to submit I/O ● AIO=Native is Preferable choice with few limitations. ● Native AIO can block the VM if the file is not fully allocated and is therefore not recommended for use on sparse files. ● Writes to sparsely allocated files are more likely to block than fully preallocated files. Therefore it is recommended to only use aio=native on fully preallocated files, local disks, or logical volumes. devconf.cz 2016
Future work devconf.cz 2016 Evaluate Virtio Data Plane Performance Reduce cpu utilization for aio=threads and consider
Questions devconf.cz 2016
References ● Stefan Hajnoczi Optimizing the QEMU Storage Stack, Linux Plumbers 2010 ● Asias He, Virtio-blk Performance Improvement, KVM forum 2012 ● Khoa Huynch: Exploiting The Latest KVM Features For Optimized Virtualized Enterprise Storage Performance, LinuxCon2012 ● Pbench: http://distributed-system-analysis.github.io/pbench/ https://github.com/distributed-system-analysis/pbench ● FIO: https://github.com/axboe/fio/ devconf.cz 2016
Special Thanks to Andrew Theurer Stefan Hajnoczj
Thanks Irc: #psuriset Blog: psuriset.com

QEMU Disk IO Which performs Better: Native or threads?

  • 1.
    devconf.cz 2014 QEMU DiskIO Which performs Better: Native or threads? Pradeep Kumar Surisetty Red Hat, Inc. devconf.cz, February 2016
  • 2.
    Outline devconf.cz 2016 ● KVM IOArchitecture ● Storage transport choices in KVM ● Virtio-blk Storage Configurations ● Performance Benchmark tools ● Challenges ● Performance Results with Native & Threads ● Limitations ● Future Work
  • 3.
    KVM I/O Architecture HARDWARE cpu0…. cpuM HARDWARE cpu0 …. cpuM HARDWARE …. Applications File System & Block Drivers vcpu0 … vcpuN … vcpuN iothread cpu0 cpuM Applications File System & Block Drivers KVM GUEST KVM Guest’s Kernel vcpu0 iothread Hardware Emulation (QEMU) Generates I/O requests to host on guest’s behalf & handle events Notes: Each guest CPU has a dedicated vcpu thread that uses kvm.ko module to execute guest code There is an I/O thread that runs a select(2) loop to handle events devconf.cz 2016 KVM (kvm.ko) File Systems and Block Devices Physical Drivers
  • 4.
    Storage transport choicesin KVM ● Full virtualization : IDE, SATA, SCSI ● Good guest compatibility ● Lots of trap-and-emulate, bad performance ● Para virtualization: virtio-blk, virtio-scsi ● Efficient guest ↔ host communication through virtio ring buffer (virtqueue) ● Good performance ● Provide more virtualization friendly interface, higher performance. ● In AIO case, io_submit() is under the global mutex devconf.cz 2016
  • 5.
    Storage transport choicesin KVM ● Device assignment (Passthrough) ● Pass hardware to guest, high-end usage, high performance ● Limited Number of PCI Devices ● Hard for Live Migration devconf.cz 2016
  • 6.
    Full virtualization Para-virtualization Storagetransport choices in KVM devconf.cz 2016
  • 7.
    Virtio PCI Controller VirtioDevice vring Guest Qemu Virtio pci controller Virtio Device Kick Ring buffer with para virtualization
  • 8.
    Virtio-blk-data-plane: ● Accelerated data pathfor para-virtualized block I/O driver ● Threads are defined by -object othread,iothread=<id> and the user can set up arbitrary device->iothread mappings (multiple devices can share an iothread) ● No need to acquire big QEMU lock KVM Guest Host Kernel QEMU Event Loop Virtio-bl data-pla KVM Guest Host Kernel QEMU Event Loop vityio-blk- data-plane thread(s) Linux AIO irqfd devconf.cz 2016
  • 9.
    Virtio-blk Storage Configurations KVMApplications Guest LVM Volume on Virtual Devices Host Server KVM Guest LVM Volume on Virtual Devices Physical Storage Applications Direct I/O w/ Para-Virtualized Drivers Block Devices /dev/sda, /dev/sdb,… Device-Backed Virtual Storage Virtual Devices /dev/vda, /dev/vdb,… KVM Guest Applications LVM File Volume Host Server KVM Guest LVM Volume Physical Storage File Applications Para- virtualized Drivers, Direct I/O RAW or QCOW2 File-Backed Virtual Storage devconf.cz 2016
  • 10.
    Openstack: Libvirt: AIO modefor disk devices 1) Asynchronous IO (AIO=Native) Using io_submit calls 2) Synchronous (AIO=Threads) pread64, pwrite64 calls Default Choice in Openstack is aio=threads* Ref: https://specs.openstack.org/openstack/novaspecs/specs/mitaka/approved/libvirt-aio-mode.html * Before solving this problem devconf.cz 2016
  • 11.
    ● </disk> <disk type='file' device='disk'> <drivername='qemu' type='qcow2' cache='none' io='native'/> <source file='/home/psuriset/xfs/vm2-native-ssd.qcow2'/> <target dev='vdb' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> ● <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='threads'/> <source file='/home/psuriset/xfs/vm2-threads-ssd.qcow2'/> <target dev='vdc' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> Example XML devconf.cz 2016
  • 12.
    CPU Usage withaio=Native devconf.cz 2016
  • 13.
    CPU Usage withaio=Threads devconf.cz 2016
  • 14.
    ext4/XFS Ext4, XFS NFS SSD, HDD Qcow2,Qcow2 (With Falloc), Qcow2 (With Fallocate), Raw(With Preallocated) File, Block device Jobs: Seq Read, Seq Write,Rand Read, Rand Write, Rand Read Write Block Sizes: 4k, 16k, 64k, 256k Number of VM: 1, 16 (Concurrent) devconf.cz 2016 Multiple layers Evaluated with virtio-blk
  • 15.
    Test Environment Hardware ● 2x Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz ● 256 GiB memory @1866MHz ● 1 x 1 TB NVMe PCI SSD ● 1 x 500 GB HDD Software ● Host: RHEL 7.2 :3.10.0-327 ● Qemu: 2.3.0-31 + AIO Merge Patch ● VM: RHEL 7.2 devconf.cz 2016
  • 16.
    Tools What is Pbench? pbench(perf bench) aims to: ● Provide easy access to benchmarking & performance tools on Linux systems ● Standardize the collection of telemetry and configuration Information ● Automate benchmark execution ● Output effective visualization for analysis allow for ingestion into elastic search devconf.cz 2016
  • 17.
    Pbench Continued... Tool visualization: sartool, total cpu consumption: devconf.cz 2016
  • 18.
    Pbench Continued .. toolvisualization: iostat tool, disk request size: devconf.cz 2016
  • 19.
    Pbench Continued .. toolvisualization: proc-interrupts tool, function call interrupts/sec: devconf.cz 2016
  • 20.
    Pbench Continued .. toolvisualization: proc-vmstat tool, numa stats: entries in /proc/vmstat which begin with “numa_” (delta/sec) devconf.cz 2016
  • 21.
    Pbench Continued .. pbenchbenchmarks example: fio benchmark # pbench_fio --config=baremetal-hdd runs a default set of iterations: [read,rand-read]*[4KB, 8KB….64KB] takes 5 samples per iteration and compute avg, stddev handles start/stop/post-process of tools for each iteration other fio options: --targets=<devices or files> --ioengine=[sync, libaio, others] --test-types=[read,randread,write,randwrite,randrw] --block-sizes=[<int>,[<int>]] (in KB) devconf.cz 2016
  • 22.
    FIO: Flexible IOTester ● IO type Defines the io pattern issued to the file(s). We may only be reading sequentially from this file(s), or we may be writing randomly. Or even mixing reads and writes, sequentially or Randomly ● Block size In how large chunks are we issuing io? This may be a single value, or it may describe a range of block sizes. ● IO size How much data are we going to be reading/writing ● IO Engine How do we issue io? We could be memory mapping the file, we could be using regular read/write, we could be using splice, async io, syslet, or even SG (SCSI generic sg) ● IO depth If the io engine is async, how large a queuing depth do we want to maintain? ● IO Type Should we be doing buffered io, or direct/raw io? devconf.cz 2016
  • 23.
    Guest Host Guest & Hostiostat during 4k seq read with aio=native devconf.cz 2016
  • 24.
    ● Aio=native usesLinux AIO io_submit(2) for read and write requests and Request completion is signaled using eventfd. ● Virtqueue kicks are handled in the iothread. When the guest writes to the virtqueue kick hardware register the kvm.ko module signals the ioeventfd which the main loop thread is monitoring. ● Requests are collected from the virtqueue and submitted (after write request merging) either via aio=threads or aio=native. ● Request completion callbacks are invoked in the main loop thread and an interrupt is injected into the guest. . AIO Native devconf.cz 2016 Virtio PCI Controller Virtio Device vring Guest Qemu Virtio pci controller Virtio Device Kick
  • 25.
    Challenges for Readwith aio=native ● ● virtio-blk does *not* merge read requests in qemu-kvm. It only merges write requests. ● QEMU submits each 4 KB request through a separate io_submit() call. ● Qemu would submit only 1 request at a time though Multiple requests to process ● Batching method was implemented for both virtio-scsi and virtio-blk-data-plane disk
  • 26.
    Batch Submission What isI/O batch submission ● Handle more requests in one single system call(io_submit), so calling number of the syscall of io_submit can be decrease a lot Abstracting with generic interfaces ● bdrv_io_plug( ) / bdrv_io_unplug( ) ● merged in fc73548e444ae3239f6cef44a5200b5d2c3e85d1 (virtio-blk: submit I/O as a batch) devconf.cz 2016
  • 27.
  • 28.
    Test Specifications SingleVM Results Multiple VM (16) Results Disk: SSD FS: None (used LVM) Image: raw Preallocated: yes aio=threads has better performance with LVM. 4K read,randread performance is 10-15% higher. 4K write is 26% higher. Native & threads perform equally in most cases but native does better in few cases. Disk: SSD FS: EXT4 Image: raw Preallocated: yes Native performs well with randwrite, write, and randread-write. Threads 4K read is 10-15% Higher.. 4K randread is 8% higher. Both have similar results. 4K seq reads: threads 1% higher. Disk: SSD FS: XFS Image: Raw Preallocated: yes aio=threads has better performance Native & threads perform equally in most cases but native does better in few cases. Threads better in seq writes Disk: SSD FS: EXT4 Image: raw Preallocated: yes NFS : yes Native performs well with randwrite, write and randread-write. Threads do well with 4K/16K read, randread by 12% higher. Native & threads perform equally in most cases but native does better in few cases. Results devconf.cz 2016
  • 29.
    Test Specifications SingleVM Results Multiple VM (16) Results Disk: SSD FS: XFS Image: raw Preallocated: yes NFS : yes Native performs well with all tests except read & randread tests where threads perform better. Native performs well with all tests. Disk: SSD FS: EXT4 Image: qcow2 Preallocated: no Native does well with all tests. Threads outperform native <10% for read and randread. Native is better than threads in most cases. Seq reads are 10-15% higher with native. Disk: SSD FS: XFS Image: qcow2 Preallocated: no Native performs well with all tests except seq read which is 6% higher Native performs better than threads except seq write, which is 8% higher Disk: SSD FS: EXT4 Image: qcow2 Preallocated: with falloc (using qemu-img) Native is optimal for almost all tests. Threads slightly better (<10%) for seq reads. Native is optimal with randwrite, write and randread-write. Threads have slightly better performance for read and randread. Disk: SSD FS: XFS Image: qcow2 Preallocate: with falloc Native is optimal for write and randread-write. Threads better (<10%) for read and randread. Native is optimal for all tests. Threads is better for seq writes. Disk: SSD FS: EXT4 Native performs better for randwrite, write, randread-write. Threads does better for read and randread. 4K,16K read,randread is 12% higher. Native outperforms threads.
  • 30.
    Test Specifications SingleVM Results Multiple VM (16) Results Disk: SSD FS: XFS Image: qcow2 Preallocated: with fallocate Native is optimal for randwrite, write and randread Threads better (<10%) for read Native optimal for all tests. Threads optimal for randread, and 4K seq write. Disk: HDD FS: No. Used LVM Image: raw Preallocated: yes Native outperforms threads in all tests. Native outperforms threads in all tests. Disk: HDD FS: EXT4 Image: raw Preallocated: yes Native outperforms threads in all tests. Native outperforms threads in all tests. Disk: HDD FS: XFS Image: raw Preallocated: yes Native outperforms threads in all tests. Native is optimal or equal in all tests. Disk: HDD FS: EXT4 Image: qcow2 Preallocated: no Native is optimal or equal in all test cases except randread where threads is 30% higher. Native is optimal except for 4K seq reads. Disk: HDD FS: XFS Image: qcow2 Preallocated: no Native is optimal except for seq writes where threads is 30% higher. Native is optimal except for 4K seq reads. Disk: HDD FS: EXT4 Native is optimal or equal in all cases except randread where threads is 30% higher. Native is optimal or equal in all tests except 4K read.
  • 31.
    Test Specifications Single VMResults Multiple VM (16) Results Disk: HDD FS: XFS Image: qcow2 Preallocated: with falloc (using qemu-img) Native is optimal or equal in all cases except seq write where threads is 30% higher. Native is optimal or equal in all cases except for 4K randread. Disk: HDD FS: EXT4 Image: qcow2 Preallocated: with fallocate Native is optimal or equal in all tests. Native is optimal or equal in all tests except 4K randread where threads is 15% higher. Disk: HDD FS: XFS Image: qcow2 Preallocated: with fallocate Native is optimal in all tests except for seq write where threads is 30% higher. Nativs is better. Threads has slightly better performance(<3-4%), excluding randread where threads is 30% higher. devconf.cz 2016
  • 32.
  • 33.
    1. Disk: SSD,Image: raw, Preallocated: yes, VMs: 16 FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads RandRead devconf.cz 2016
  • 34.
    RandReadWrite FS: no.Used LVM,aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
  • 35.
    RandWrite devconf.cz 2016 FS: no.UsedLVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads
  • 36.
    Seq Read FS: no.UsedLVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
  • 37.
    Seq Write devconf.cz 2016 FS:no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads
  • 38.
    2. Disk: HDD,Image: raw, Preallocated: yes, VMs: 16 RandRead RandReadWrite devconf.cz 2016 FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads
  • 39.
    RandWrite Seq Read Seq Write devconf.cz2016 RandWrite FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads
  • 40.
    3. Disk: SSD,Image: raw, Preallocated: yes, VMs: 1 RandRead RandRead Write FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
  • 41.
    RandRead Write Seq Read FS:no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
  • 42.
    Seq Write FS: no.UsedLVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
  • 43.
    Rand Read RandRead Write Rand Write 4. Disk: HDD, Image: raw, Preallocated: yes, VMs: 1 FS: no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
  • 44.
    Seq Read Seq Write FS:no.Used LVM, aio=native FS: No.Used LVM, aio=threads FS: EXT4, aio=native FS: EXT4, aio=threads FS: XFS, aio=native FS: XFS, aio=threads devconf.cz 2016
  • 45.
    5. Disk: SSD,Image: qcow2, VMs: 16 RandRead 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
  • 46.
    RandReadWrite RandWrite 1 FS:EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
  • 47.
    Seq Read SeqWrite 1 FS:EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
  • 48.
    6. Disk: HDD,Image: qcow2, VMs: 16 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
  • 49.
    RandRead Rand ReadWrite Rand Write Seq Read 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
  • 50.
    7. Disk: SSD,Image: qcow2, VMs: 1 RandRead 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
  • 51.
    RandReadWrite RandWrite 1 FS:EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
  • 52.
    Seq Read 1 FS:EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
  • 53.
    Seq Write 1 FS:EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
  • 54.
    8. Disk: HDD,Image: qcow2, VMs: 1 Seq Read 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
  • 55.
    Rand Read WriteRand Write Rand Read 1 FS: EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
  • 56.
    Seq Write 1 FS:EXT4, aio=native, Img: qcow2 2 FS: EXT4, aio=threads, Img: qcow2 3 FS: XFS, aio=native, Img: qcow2 4 FS: XFS, aio=threads, Img: qcow2 5 FS: EXT4, aio=native, Img: qcow2, Prealloc: Falloc 6 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Falloc 7 FS: XFS, aio=native, Img: qcow2, Prealloc: Falloc 8 FS: XFS, aio=threads, Img: qcow2, Prealloc: Falloc 9 FS: EXT4, aio=native, Img: qcow2, Prealloc: Fallocate 10 FS: EXT4, aio=threads, Img: qcow2, Prealloc: Fallocate 11 FS: XFS, aio=native, Img: qcow2, Prealloc: Fallocate 12 FS: XFS, aio=threads, Img: qcow2, Prealloc: Fallocate devconf.cz 2016
  • 57.
    8. Disk: SSD,Image: raw, NFS: yes, VMs: 1 1.FS: EXT4, aio=native 2. FS: EXT4, aio=threads 3. FS: XFS, aio=native 4. FS: XFS, aio=threads Rand Read Rand Read Write devconf.cz 2016
  • 58.
    Rand Write Seq Read SeqWrite devconf.cz 2016
  • 59.
  • 60.
  • 61.
    Conclusion & Limitations ●Throughput increased a lot because IO thread takes fewer CPU to submit I/O ● AIO=Native is Preferable choice with few limitations. ● Native AIO can block the VM if the file is not fully allocated and is therefore not recommended for use on sparse files. ● Writes to sparsely allocated files are more likely to block than fully preallocated files. Therefore it is recommended to only use aio=native on fully preallocated files, local disks, or logical volumes. devconf.cz 2016
  • 62.
    Future work devconf.cz 2016 EvaluateVirtio Data Plane Performance Reduce cpu utilization for aio=threads and consider
  • 63.
  • 64.
    References ● Stefan HajnocziOptimizing the QEMU Storage Stack, Linux Plumbers 2010 ● Asias He, Virtio-blk Performance Improvement, KVM forum 2012 ● Khoa Huynch: Exploiting The Latest KVM Features For Optimized Virtualized Enterprise Storage Performance, LinuxCon2012 ● Pbench: http://distributed-system-analysis.github.io/pbench/ https://github.com/distributed-system-analysis/pbench ● FIO: https://github.com/axboe/fio/ devconf.cz 2016
  • 65.
  • 66.