Maybe why OverlayFS had its readdir() inode number issue

October 11, 2025

A while back I wrote about readdir()'s inode numbers versus OverlayFS, which discussed an issue where for efficiency reasons, OverlayFS sometimes returned different inode numbers in readdir() than in stat(). This is not POSIX legal unless you do some pretty perverse interpretations (as covered in my entry), but lots of filesystems deviate from POSIX semantics every so often. A more interesting question is why, and I suspect the answer is related to another issue that's come up, the problem of NFS exports of NFS mounts.

What's common in both cases is that NFS servers and OverlayFS both must create an 'identity' for a file (a NFS filehandle and an inode number, respectively). In the case of NFS servers, this identity has some strict requirements; OverlayFS has a somewhat easier life, but in general it still has to create and track some amount of information. Based on reading the OverlayFS article, I believe that OverlayFS considers this expensive enough to only want to do it when it has to.

OverlayFS definitely needs to go to this effort when people call stat(), because various programs will directly use the inode number (the POSIX 'file serial number') to tell files on the same filesystem apart. POSIX technically requires OverlayFS to do this for readdir(), but in practice almost everyone that uses readdir() isn't going to look at the inode number; they look at the file name and perhaps the d_type field to spot directories without needing to stat() everything.

If there was a special 'not a valid inode number' signal value, OverlayFS might use that, but there isn't one (in either POSIX or Linux, which is actually a problem). Since OverlayFS needs to provide some sort of arguably valid inode number, and since it's reading directories from the underlying filesystems, passing through their inode numbers from their d_ino fields is the simple answer.

(This entry was inspired by Kevin Lyda's comment on my earlier entry.)

Sidebar: Why there should be a 'not a valid inode number' signal value

Because both standards and common Unix usage include a d_ino field in the structure readdir() returns, they embed the idea that the stat()-visible inode number can easily be recovered or generated by filesystems purely by reading directories, without needing to perform additional IO. This is true in traditional Unix filesystems, but it's not obvious that you would do that all of the time in all filesystems. The on disk format of directories might only have some sort of object identifier for each name that's not easily mapped to a relatively small 'inode number' (which is required to be some C integer type), and instead the 'inode number' is an attribute you get by reading file metadata based on that object identifier (which you'll do for stat() but would like to avoid for reading directories).

But in practice if you want to design a Unix filesystem that performs decently well and doesn't just make up inode numbers in readdir(), you must store a potentially duplicate copy of your 'inode numbers' in directory entries.

Written on 11 October 2025.
« Keeping notes is for myself too, illustrated (once again)
The early Unix history of chown() being restricted to root »

Page tools: View Source, Add Comment.

Last modified: Sat Oct 11 22:53:18 2025
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.