We have several 90TB servers (Areca RAID-6 partitioned into 10 ext4 partitions).
The application reads a specialized hardware device and writes data to disk at a constant 8 to 10 MBytes/second. (the app runs separate reader and writer threads with a memory buffer between)
Once in a while file operations slow to a crawl, eg, fwrite() taking 10-30 seconds to write 200 bytes of data, ftell() taking a similar time to get the file position, etc.
Sometimes (but not always), when the system slows down top shows all processes with 0 %CPU, similar to this serverfault posting.
The slowdown will last a few minutes and then the machine will be fine. Later it will happen again.
There is no "daily frequency" of the slowdown so I've eliminated something running via cron / anacron etc.
However, during a slowdown the system load goes from 0.4 (normal) to 3-4. iotop often shows kworker/u16:0+flush-8:112 with very high I/O %. top, vmstat etc show very high %wa, so some I/O bottleneck somewhere.
I've changed the I/O scheduler to mq-deadline.
Also, the application runs fsync every 5 seconds on the file system containing the open file being written.
I've been googling for days on what might be causing this slowdown to find what parameters could be adjusted but no luck so far.
Any suggestions on how to tune Linux for this constant-write scenario?
Centos 8
# uname -a Linux 4.18.0-147.el8.x86_64 #1 SMP Wed Dec 4 21:51:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux IOTOP OUTPUT
Total DISK READ : 161.32 K/s | Total DISK WRITE : 0.00 B/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND b' 10666 rt/0 root 161.32 K/s 0.00 B/s 0.00 % 4.43 % MyWritingApp' Total DISK READ : 239.58 K/s | Total DISK WRITE : 78.27 M/s Actual DISK READ: 239.58 K/s | Actual DISK WRITE: 2.20 M/s PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND b' 431412 be/4 root 135.01 K/s 0.00 B/s 0.00 % 44.80 % [kworker/u16:8-flush-8:32]' b' 10666 rt/0 root 104.57 K/s 78.11 M/s 0.00 % 1.88 % MyWritingApp' b' 8853 be/3 root 0.00 B/s 9.27 K/s 0.00 % 0.01 % [jbd2/sdh1-8]' b' 1123 ?dif mysql 0.00 B/s 149.57 K/s 0.00 % 0.00 % mysqld --basedir=/usr' Total DISK READ : 388.68 K/s | Total DISK WRITE : 14.98 M/s Actual DISK READ: 469.32 K/s | Actual DISK WRITE: 1899.42 K/s PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND b' 338764 be/4 root 388.68 K/s 0.00 B/s 0.00 % 89.25 % [kworker/u16:0+flush-8:112]' b' 1123 ?dif mysql 0.00 B/s 955.83 K/s 0.00 % 0.00 % mysqld --basedir=/usr' b' 2740 be/4 apache 0.00 B/s 1353.76 B/s 0.00 % 0.00 % httpd -DFOREGROUND' b' 10664 be/4 root 0.00 B/s 1353.76 B/s 0.00 % 0.00 % php /usr/local/dvstor/bin64/DvStorStartRecording.php mode=ASI tsNum=0' b' 10666 rt/0 root 0.00 B/s 14.04 M/s 0.00 % 0.00 % MyWritingApp' Total DISK READ : 609.63 K/s | Total DISK WRITE : 15.87 K/s Actual DISK READ: 609.63 K/s | Actual DISK WRITE: 31.74 K/s PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND b' 338764 be/4 root 601.70 K/s 0.00 B/s 0.00 % 99.43 % [kworker/u16:0+flush-8:112]' b' 711 be/4 root 7.93 K/s 0.00 B/s 0.00 % 3.00 % systemd-journald' b' 9246 be/3 root 0.00 B/s 7.93 K/s 0.00 % 0.00 % [jbd2/sdi1-8]' b' 933 be/3 root 0.00 B/s 1354.15 B/s 0.00 % 0.00 % auditd' b' 1966 be/4 root 0.00 B/s 2.64 K/s 0.00 % 0.00 % rsyslogd -n' b' 2740 be/4 apache 0.00 B/s 1354.15 B/s 0.00 % 0.00 % httpd -DFOREGROUND' VMSTAT OUTPUT
procs -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu-------- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 2 194048 334164 612708 12537168 0 1 14 1114 13 217 1 2 95 1 0 0 2 194048 329368 614000 12540308 0 0 646 0 5136 6152 1 1 78 19 0 0 2 194048 324640 615288 12543276 0 0 622 96 5249 6541 1 2 79 18 0 0 2 194048 321584 616624 12546188 0 0 664 512 5660 6815 2 2 81 15 0 0 2 194048 317788 617884 12549124 0 0 626 52 6079 7983 2 3 80 15 0 15930768 K total memory 2445452 K used memory 2673404 K active memory 11678768 K inactive memory 318308 K free memory 617888 K buffer memory 12549120 K swap cache 8212476 K total swap 194048 K used swap 8018428 K free swap I/O Scheduler
# for d in {b..k}; do echo -n "sd${d} " ; cat /sys/block/sd${d}/queue/scheduler; done sdb [mq-deadline] kyber bfq none sdc [mq-deadline] kyber bfq none sdd [mq-deadline] kyber bfq none sde [mq-deadline] kyber bfq none sdf [mq-deadline] kyber bfq none sdg [mq-deadline] kyber bfq none sdh [mq-deadline] kyber bfq none sdi [mq-deadline] kyber bfq none sdj [mq-deadline] kyber bfq none sdk [mq-deadline] kyber bfq none