0

My Ubuntu Xenial system is as follows:

# uname -r 4.4.0-179-generic # 

I have the following simple C code which writes 5 bytes to a file every second, for a total of 5 times:

{ int fd = open("test.txt", O_WRONLY | O_TRUNC); if (fd == -1) return -1; int x = 0; while (x++ < 5) { write(fd, "Hello", 5); sleep(1); } } 

I also have the following:

# cat /proc/sys/vm/dirty_writeback_centisecs 500 # 

However, when I tail -f test.txt I see the output appear in the file immediately: according to the above, I would expect to see it appear only after 5 seconds. I have done lots of research into this, finding that pdflush is not present anymore, but have been unable to find out what process/thread is responsible for writing back dirty pages in my kernel.

Can someone please clear this up, and how I can control when data gets flushed to my text file?

Update:

After the answer below, I tested to see the modification time of the file. LinuxScratch is the name of the binary that the code shown earlier is in.

We can see that the file is indeed being updated every second:

$ ./LinuxScratch & [1] 1475 $ ls --time-style='+%d-%m-%Y %H:%M:%S' -l total 104 -rwxrwxr-x 1 ubuntu ubuntu 37576 22-12-2020 19:35:46 LinuxScratch -rw-rw-r-- 1 ubuntu ubuntu 15 22-12-2020 19:41:47 test.txt $ ls --time-style='+%d-%m-%Y %H:%M:%S' -l total 104 -rwxrwxr-x 1 ubuntu ubuntu 37576 22-12-2020 19:35:46 LinuxScratch -rw-rw-r-- 1 ubuntu ubuntu 20 22-12-2020 19:41:48 test.txt $ ls --time-style='+%d-%m-%Y %H:%M:%S' -l total 104 -rwxrwxr-x 1 ubuntu ubuntu 37576 22-12-2020 19:35:46 LinuxScratch -rw-rw-r-- 1 ubuntu ubuntu 25 22-12-2020 19:41:49 test.txt $ ls --time-style='+%d-%m-%Y %H:%M:%S' -l total 104 -rwxrwxr-x 1 ubuntu ubuntu 37576 22-12-2020 19:35:46 LinuxScratch -rw-rw-r-- 1 ubuntu ubuntu 25 22-12-2020 19:41:49 test.txt [1]+ Done ./LinuxScratch $ 

Update 2

I have also found out that regarding the threads that are responsible for flushing, this answer states that it is generic [kworker/#.##] kernel threads that are responsible for doing the writeback.

Update 3

After reading the LWN.net article in the answer, I did more research and found that by experimenting with values for /proc/sys/vm/dirty_expire_centisecs (as described here), and running a watch on /proc/vmstat, I can see the dirty pages increase then decrease as they are flushed.

$ sudo sysctl -w vm.dirty_expire_centisecs=250 $ watch -d -n 0.1 grep -e dirty /proc/vmstat 

In another shell:

$ ./LinuxScratch 

This shows the dirty pages (nr_dirty) increase by 1, then drop back down after roughly 2.5 seconds (either way it is far sooner than with the default value of 3000 (30 seconds)).

This page was also useful for parameter descriptions.

So all told, the accepted answer partially explains the original question, my updates here show how to demonstrate/control this. I ultimately wanted to be able to somehow stop the flushing of kernel buffers to the filesystem, just to see (hence fully understand) the functionality, but I guess from the accepted answer that this might be impossible because of the way that the kernel always gets and shows the latest data from its cache.

1 Answer 1

2

Your tail command also gets its input from the page cache. The whole point of the page cache is that a process (any process, not only the process writing to the file) can find the wanted data quickly from the cache, without accessing the disk.

The pdflush kernel threads have been replaced by per-backing-device kernel threads. This LWN.net article describes the change and the motivation behind it.

5
  • Ah-ha, that makes sense. Thanks. However, the file is still being written; when I run the above code in the background, and execute ls --time-style='+%d-%m-%Y %H:%M:%S' -l several times, I see test.txt has a modification time that is updated every second. Hence the file is being updated, so the kernel buffers written by write() must be being flushed. This is what I want to control: the frequency at which this flush happens. How can I do this? Commented Dec 22, 2020 at 19:40
  • The kernel maintains an illusion that files are written to and read from a disk or some other medium, and that the files (with associated timestamps) are created on said medium. I say illusion, because all this is sped up by caching data in the page cache. The semantics are the same even if the actual write is delayed. Thus the modification times are also "faked", and do not necessarily reflect the true flush time. Commented Dec 22, 2020 at 20:24
  • Right, ok; thanks. I really want to be able to learn more about the automatic flushing of the buffers from a learning point-of-view, but it seems this might be quite hard. Do you know how this happens: what process/thread is doing it,and how I can watch it on my system? As I say, I don't have a pdflush and don't know what is doing it on my system... Commented Dec 22, 2020 at 20:30
  • I added a link to my answer. Commented Dec 23, 2020 at 6:39
  • Thanks. I've updated my question with my additional findings/experiements for easy reference for others. Commented Dec 23, 2020 at 17:02

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.