NFS mounts getting unmounted, possibly by the kernel

Question

Recently on a few machines, mount points have started to disappear(One mountpoint per server, at random intervals and random machines) and I can find nothing in the logs. I have five mountpoints and randomly any of them will go away. There is no relation between the disappearance and the mountpoint protocol (both TCP and UDP mounts will disappear).

What I have not tried
Run tcp dump continiously (and am reluctant to do so, since this issue happens once every 2-3 days...)

Info about the machines:
NFS booted, the boot server is FreeBSD 11.0. (nothing in its logs btw) rootfs options are:
(rw,noatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=ADDRESS,mountvers=3,mountport=677,mountproto=udp,local_lock=all,addr=ADDRESS)
OS is CentOs7, running the 4.11.0-1 ML kernel. Example mount options:
rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=ADDRESS,mountvers=3,mountport=4002,mountproto=udp,local_lock=none,addr=ADDRESS) (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=ADDRESS,mountvers=3,mountport=4002,mountproto=udp,local_lock=none,addr=ADDRESS,_netdev)
Info about the NFS shares and server
I have in total 5 distinct NFS servers, load balancing is done over DNS, all export the same mountpoints(a few shares UDP, a few TCP). The server running the NFS server is RHEL 6.7(Santiago), kernel version 2.6.32-573.el6, snfs-common/server/client is 4.7.3. As you have also guessed nothing in the server logs relating to this problem. Example of export options:
(rw,async,root_squash,no_wdelay,no_subtree_check,fsid=ID)
Things I have tried so far:
My first assumption is that I have a proccess which calls umount or umount2, through some bizzare reason on a random NFS share, after tracing with sysdig for both unlinkat, unlink, unmount and remove systemcalls, I can only see systemd-logind doing unmount2 when a user session is destroyed, but not on the mountpoints. The sysdig filter I used in a chisel is posted below:
function on_init() local filename = path for i in string.gmatch(path, "[^/]+") do filename = i end print("PID\tPROC_NAME\tPROC_EXEC\tPROC_SID\tPROC_PNAME\tPROC_PPID\tPROC_EXELINE\tPROC_PCMDLINE") chisel.set_event_formatter("%proc.pid\t%proc.name\t%proc.exec\t%proc.sid\t%proc.pname\t%proc.ppid\t%proc.exeline\t%proc.pcmdline ") chisel.set_filter( "(evt.type=unlinkat and evt.arg.name=" .. path .. ") or \ (evt.type=unlink and evt.arg.path=" .. path .. ") or \ (evt.type=umount) or \ (evt.type=remove and evt.arg.path=" .. path .. ")") return true end
The unmount happened again randomly, but the filter was unable to see that. Thinking the filter is flawed I created a program that unmounts a share both with umount and umount2 (tried both the lazy and the force umount flags) and the filter detected them correctly, so this leaves me to believe that the kernel is umounting things.
I have nothing in my logs not even the usual "nfs not responding" message when there is a problem with the share.
If I login on a machine and remount, the remount is successful without any problem.
I have numerous clients running from the same setup and this does not happen there. The only thing this group of machines have in common is their network segment and the NFS boot server. But I fail to see why absolutely nothing will be reported if communication between the server and the client died.

See unix.stackexchange.com/questions/114699/… and webcache.googleusercontent.com/… for possible solutions. — Deathgrip
– Deathgrip, Commented Jul 24, 2017 at 19:02
You need to provide more info on how you are mounting the filesystems e.g. the /etc/fstab entry or whatever ... — Murray Jensen
– Murray Jensen, Commented Jul 25, 2017 at 0:10
@RuiFRibeiro Not that I know of. As I said again I have 2000 clients (with different kernels) running the same setup and I only experience this problem on a subset ot 20 machines. I have no idea where to look. — Hristo Mohamed
– Hristo Mohamed, Commented Jul 25, 2017 at 8:39
@MurrayJensen I have already provided the mount from /proc/mounts. I am mounting with fstab, example mount is: server:nfs_share /mountpoint nfs rw,hard,intr,nfsvers=3 0 0 — Hristo Mohamed
– Hristo Mohamed, Commented Jul 25, 2017 at 8:41

sourcejedi · Accepted Answer · 2017-10-27 15:22:30Z

If anybody cares, this was an NFS bug in the kernel. Should be fixed by commit cc89684c9a265828ce061037f1f79f4a68ccd3f7.

NFS: only invalidate dentrys that are clearly invalid

Since commit bafc9b754f75 ("vfs: More precise tests in d_invalidate") in v3.18, a return of '0' from ->d_revalidate() will cause the dentry to be invalidated even if it has filesystems mounted on or it or on a descendant. The mounted filesystem is unmounted.

...

Stack Exchange Network

NFS mounts getting unmounted, possibly by the kernel

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

NFS mounts getting unmounted, possibly by the kernel

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions