Ext4 exhibits unexpected write latency variance vs. ext2

Question

I have a latency sensitive application running on an embedded system, and I'm seeing some discrepancy between writing to a ext4 partition and an ext2 partition on the same physical device. Specifically, I see intermittent delays when performing many small updates on a memory map, but only on ext4. I've tried what seem to be some of the usual tricks for improving performance (especially variations in latency) by mounting ext4 with different options and have settled on these mount options:

mount -t ext4 -o remount,rw,noatime,nodiratime,user_xattr,barrier=1,data=ordered,nodelalloc /dev/mmcblk0p6 /media/mmc/data

barrier=0 didn't seem to provide any improvement.

For the ext2 partition, the following flags are used:

/dev/mmcblk0p3 on /media/mmc/data2 type ext2 (rw,relatime,errors=continue)

Here's the test program I'm using:

#include <stdio.h> #include <cstring> #include <cstdio> #include <string.h> #include <stdint.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> #include <fcntl.h> #include <stdint.h> #include <cstdlib> #include <time.h> #include <stdio.h> #include <signal.h> #include <pthread.h> #include <unistd.h> #include <errno.h> #include <stdlib.h> uint32_t getMonotonicMillis() { struct timespec time; clock_gettime(CLOCK_MONOTONIC, &time); uint32_t millis = (time.tv_nsec/1000000)+(time.tv_sec*1000); return millis; } void tune(const char* name, const char* value) { FILE* tuneFd = fopen(name, "wb+"); fwrite(value, strlen(value), 1, tuneFd); fclose(tuneFd); } void tuneForFasterWriteback() { tune("/proc/sys/vm/dirty_writeback_centisecs", "25"); tune("/proc/sys/vm/dirty_expire_centisecs", "200"); tune("/proc/sys/vm/dirty_background_ratio", "5"); tune("/proc/sys/vm/dirty_ratio", "40"); tune("/proc/sys/vm/swappiness", "0"); } class MMapper { public: const char* _backingPath; int _blockSize; int _blockCount; bool _isSparse; int _size; uint8_t *_data; int _backingFile; uint8_t *_buffer; MMapper(const char *backingPath, int blockSize, int blockCount, bool isSparse) : _backingPath(backingPath), _blockSize(blockSize), _blockCount(blockCount), _isSparse(isSparse), _size(blockSize*blockCount) { printf("Creating MMapper for %s with block size %i, block count %i and it is%s sparse\n", _backingPath, _blockSize, _blockCount, _isSparse ? "" : " not"); _backingFile = open(_backingPath, O_CREAT | O_RDWR | O_TRUNC, 0600); if(_isSparse) { ftruncate(_backingFile, _size); } else { posix_fallocate(_backingFile, 0, _size); fsync(_backingFile); } _data = (uint8_t*)mmap(NULL, _size, PROT_READ | PROT_WRITE, MAP_SHARED, _backingFile, 0); _buffer = new uint8_t[blockSize]; printf("MMapper %s created!\n", _backingPath); } ~MMapper() { printf("Destroying MMapper %s\n", _backingPath); if(_data) { msync(_data, _size, MS_SYNC); munmap(_data, _size); close(_backingFile); _data = NULL; delete [] _buffer; _buffer = NULL; } printf("Destroyed!\n"); } void writeBlock(int whichBlock) { memcpy(&_data[whichBlock*_blockSize], _buffer, _blockSize); } }; int main(int argc, char** argv) { tuneForFasterWriteback(); int timeBetweenBlocks = 40*1000; //2^12 x 2^16 = 2^28 = 2^10*2^10*2^8 = 256MB int blockSize = 4*1024; int blockCount = 64*1024; int bigBlockCount = 2*64*1024; int iterations = 25*40*60; //25 counts simulates 1 layer for one second, 5 minutes here uint32_t startMillis = getMonotonicMillis(); int measureIterationCount = 50; MMapper mapper("sparse", blockSize, bigBlockCount, true); for(int i=0; i<iterations; i++) { int block = rand()%blockCount; mapper.writeBlock(block); usleep(timeBetweenBlocks); if(i%measureIterationCount==measureIterationCount-1) { uint32_t elapsedTime = getMonotonicMillis()-startMillis; printf("%i took %u\n", i, elapsedTime); startMillis = getMonotonicMillis(); } } return 0; }

Fairly simplistic test case. I don't expect terribly accurate timing, I'm more interested in general trends. Before running the tests, I ensured that the system is in a fairly steady state with very little disk write activity occuring by doing something like:

watch grep -e Writeback: -e Dirty: /proc/meminfo

There is very little to no disk activity. This is also verified by seeing 0 or 1 in the wait column from the output of vmstat 1. I also perform a sync immediately before running the test. Note the aggressive writeback parameters being provided to the vm subsystem as well.

When I run the test on the ext2 partition, the first one hundred batches of fifty writes yield a nice solid 2012 ms with a standard deviation of 8 ms. When I run the same test on the ext4 partition, I see an average of 2151 ms, but an abysmal standard deviation of 409 ms. My primary concern is variation in latency, so this is frustrating. The actual times for the ext4 partition test looks like this:

{2372, 3291, 2025, 2020, 2019, 2019, 2019, 2019, 2019, 2020, 2019, 2019, 2019, 2019, 2020, 2021, 2037, 2019, 2021, 2021, 2020, 2152, 2020, 2021, 2019, 2019, 2020, 2153, 2020, 2020, 2021, 2020, 2020, 2020, 2043, 2021, 2019, 2019, 2019, 2053, 2019, 2020, 2023, 2020, 2020, 2021, 2019, 2022, 2019, 2020, 2020, 2020, 2019, 2020, 2019, 2019, 2021, 2023, 2019, 2023, 2025, 3574, 2019, 3013, 2019, 2021, 2019, 3755, 2021, 2020, 2020, 2019, 2020, 2020, 2019, 2799, 2020, 2019, 2019, 2020, 2020, 2143, 2088, 2026, 2017, 2310, 2020, 2485, 4214, 2023, 2020, 2023, 3405, 2020, 2019, 2020, 2020, 2019, 2020, 3591}

Unfortunately, I don't know if ext2 is an option for the end solution, so I'm trying to understand the difference in behavior between the file systems. I would most likely have control over at least the flags being used to mount the ext4 system and tweak those.

noatime/nodiratime don't seem to make much of a dent
barrier=0/1 doesn't seem to matter
nodelalloc helps a bit, but doesn't do nearly enough to smooth out the latency variation.
The ext4 partition is only about 10% full.

Thanks for any thoughts on this issue!

Gimballon · Accepted Answer · 2013-07-05 22:06:07Z

2

One word: Journaling.

http://www.thegeekstuff.com/2011/05/ext2-ext3-ext4/

As you talk about embedded im assuming you have some form of flash memory? Performance is very spiky on the journaled ext4 on flash. Ext2 is recommended.

Here is a good article on disabling journaling and tweaking the fs for no journaling if you must use ext4: http://fenidik.blogspot.com/2010/03/ext4-disable-journal.html

answered Jul 5, 2013 at 22:06

Gimballon

1265 bronze badges

Well, I'm going to test this out. This is unfortunate because the main reason for choosing ext4 was journaling. Being an embedded device, the system often loses power under less than ideal circumstances.

J Trana
– J Trana

2013-07-05 22:27:31 +00:00
Commented Jul 5, 2013 at 22:27
I guess one further question on this: based on this under the commit=nrsec region, I would have expected to see these periodic latency issues every 5 seconds on the button, but that's not at all the pattern. Any thoughts why?

J Trana
– J Trana

2013-07-05 22:29:12 +00:00
Commented Jul 5, 2013 at 22:29
Because you have the fs under heavy load. The journal will write out to disk if it runs out of room.

Gimballon
– Gimballon

2013-07-05 22:36:10 +00:00
Commented Jul 5, 2013 at 22:36
Hmm, I don't buy that it's flushing because it's running out of room because I just did some testing with commit. First I tried setting it to 1 second. I expected worse performance but hopefully more deterministic behaviour. That did not occur, but I did get considerably worse performance. However, just did a test with commit=60. It looks like all the journalling updates even past 60 seconds get more spread out and the performance is nice and smooth. End result: mount -t ext4 -o remount,rw,noatime,nodiratime,user_xattr,barrier=1,data=ordered,nodelalloc,commit=60 /dev/mmcblk0p6 /media/mmc/data

J Trana
– J Trana

2013-07-05 22:49:41 +00:00
Commented Jul 5, 2013 at 22:49
1

Well, I still see some very occasional bursts (over 900 seconds, one above 4000 ms, 2 near 3000ms, and 3 above normal) but the problem is much better, so the journaling is very clearly the cause of the issue. I don't know if turning off journaling will be my final solution, but this answer helped illuminate something I would not have thought to be the bottleneck. Thanks!

J Trana
– J Trana

2013-07-05 23:00:57 +00:00
Commented Jul 5, 2013 at 23:00

Add a comment |

Stack Exchange Network

Ext4 exhibits unexpected write latency variance vs. ext2

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Ext4 exhibits unexpected write latency variance vs. ext2

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions