35

In Linux, it seems that filesystem time is always some milliseconds behind system time, leading to inconsistencies if you want to check if a file has been modified before or after a given time in very narrow time ranges (milliseconds).

In any Linux system with a filesystem that supports nanosecond resolution (I tried with ext4 with 256-byte inodes and ZFS), if you try to do something like:

date +%H:%M:%S.%N; echo "hello" > test1; stat -c %y test1 | cut -d" " -f 2 

the second output value (file modification time) is always some milliseconds behind the first one (system time), e.g.:

17:26:42.400823099 17:26:42.395348462 

while it should be the other way around, since the file test1 is modified after calling the date command.

You can get the same result in python:

import os, time def test(): print(time.time()) with open("test1", "w") as f: f.write("hello") print(os.stat("test1").st_mtime) test() 
1698255477.3125281 1698255477.3070245 

Why is it so, and is there a way to avoid it, so that system time is consistent with filesystem time? The only workaround I found so far is to get filesystem "time" (whatever that means in practice) by creating a dummy temporary file and getting its modification time, like this:

def get_filesystem_time(): """ get the current filesystem time by creating a temporary file and getting its modification time. """ with tempfile.NamedTemporaryFile() as f: return os.stat(f.name).st_mtime 

but I wonder if there is a cleaner solution.

2

2 Answers 2

36

The time used for file timestamps is the time at the last timer tick, which is always slightly in the past. The current_time function in inode.c calls ktime_get_coarse_real_ts64:

/** * current_time - Return FS time * @inode: inode. * * Return the current time truncated to the time granularity supported by * the fs. * * Note that inode and inode->sb cannot be NULL. * Otherwise, the function warns and returns time without truncation. */ struct timespec64 current_time(struct inode *inode) { struct timespec64 now; ktime_get_coarse_real_ts64(&now); if (unlikely(!inode->i_sb)) { WARN(1, "current_time() called with uninitialized super_block in the inode"); return now; } return timestamp_truncate(now, inode); } 

and the latter is part of a family of functions documented as follows:

These are quicker than the non-coarse versions, but less accurate, corresponding to CLOCK_MONOTONIC_COARSE and CLOCK_REALTIME_COARSE in user space, along with the equivalent boottime/tai/raw timebase not available in user space.

The time returned here corresponds to the last timer tick, which may be as much as 10ms in the past (for CONFIG_HZ=100), same as reading the 'jiffies' variable. These [functions] are only useful when called in a fast path and one still expects better than second accuracy, but can't easily use 'jiffies', e.g. for inode timestamps. Skipping the hardware clock access saves around 100 CPU cycles on most modern machines with a reliable cycle counter, but up to several microseconds on older hardware with an external clocksource.

Note the specific mention of inode timestamps.

I’m not aware of any way of avoiding this entirely, short of modifying the kernel. You can reduce the impact by increasing CONFIG_HZ. There has been a recent proposal to improve this, which is still being worked on.

6
  • 1
    ooh, that makes some sense. Commented Oct 26, 2023 at 10:51
  • wow, would have thought that on "expensive" operations like storing metadata, the linux kernel would have actually used the CPU HRT, but yeah, makes sense, this has to work on all machines, not just modern x86/aarch64, and enjoying the interestingness of different code paths for different architectures in file systems is something that I can see people concerned with reliability might not think too highly off! Commented Oct 26, 2023 at 11:00
  • 1
    Investigated this a bit more in my answer, which is really just to be seen as an add-on to yours. Commented Oct 26, 2023 at 11:26
  • 2
    "The time returned here corresponds to the last timer tick [. . .] same as reading the 'jiffies' variable. These are only useful when...". The "these" in that sentence refers to the void ktime_get_coarse* family of functions. This is a bit clearer if one reads the documentation page in context. Commenting here because it was confusing me. Commented Oct 26, 2023 at 21:14
  • 3
    Funny thing is that POSIX allows for a bit of uncertainty in the mtime, but requires that the value be at or after the time of the last write, for reasons hinted at in the question. They should have added one tick. Commented Oct 27, 2023 at 3:35
22

Stephen Kitt's answer seems to be spot-on.

We can reproduce this very nicely by actually getting the same "coarse" clock that the filesystem uses, at least on my kernel configuration; a C program that always gets the coarse realtime clock before accessing the file takes either exactly the timestamp of the file, or (rarely) a timestamp one system tick earlier:

// excerpt from the program linked above, not a relicensing // … clock_gettime(CLOCK_REALTIME_COARSE, &now_coarse); clock_gettime(CLOCK_REALTIME, &now); int fd = open("temp", O_WRONLY | O_CREAT); write(fd, data, length); close(fd); clock_gettime(CLOCK_REALTIME, &now_after); stat("temp", &props); printf("Differences relative to coarse clock before:\n" "Fine Realtime before: %+8jd ns\n" "File Modification Time: %+8jd ns\n" "Realtime clock after: %+8jd ns\n", ns_difference(&now_coarse, &now), ns_difference(&now_coarse, &props.st_mtim), ns_difference(&now_coarse, &now_after)); 

yields something like

Differences relative to coarse clock before: Fine Realtime before: +1551810 ns File Modification Time: +0 ns Realtime clock after: +1626199 ns 

with the aforementioned rare occurrence of a circa tick-duration delta, which happens when the system tick falls just between the getting of the now_coarse and the modification of the file:

Differences relative to coarse clock before: Fine Realtime before: +1497562 ns File Modification Time: +999992 ns Realtime clock after: +1609943 ns 

By the way, statistics show that if we do the above until we collected 10,000 occurrences where this "tick jump" happened, that the range of possible delays is quite small:

Total processed: 777618 Observed tick progressions: 10000 Percentage: 0.013 Minimum tick delta: 999992 Maximum tick delta: 999993 Average tick delta: 999992.905900 

In other words, we have extremely close timing when it comes to the deltas between ticks.

4
  • 3
    oh, great job! (got to love answers with C code :P) Commented Oct 26, 2023 at 13:07
  • 4
    @ilkkachu why, thank you! (Originally, wanted to do this in C++, then decided against it, because I'd just end up calling C functions, then regretted C as soon as there was data to be handled. The usual.). Commented Oct 26, 2023 at 13:12
  • Now I wonder how badly I hammered my XFS filesystem by running the above "open (create)", "close" "unlink" cycle a couple million times… Commented Oct 26, 2023 at 13:34
  • 3
    @MarcusMüller: A negligible amount, I'm sure. Running that program in a shell loop on an XFS filesystem on an otherwise-idle disk, basically zero bytes written (like 68 bytes / sec on average in the second minute, probably just one write to update the timestamp on the directory I was in.) XFS does lazy allocation of file data, and apparently lazy-enough updates of directory metadata and inodes. (My mount options were rw,noatime,attr2,inode64,logbufs=8,logbsize=32k,noquota. If I'd had non-lazy atime, then there'd have been writes, although still probably only on writeback intervals.) Commented Oct 27, 2023 at 1:14

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.