0

I have 19 CentOS Linux virtual servers for a specific solution. These servers are distributed in three network areas, C, D, S.

The NFS server is located in zone S and is configured to exposie a share, mountable for the other severs in all three network zones. Firewall and routing rules between C <-> S and D <-> S are established and working. The NFS share has been confirmed to work in the past.

The NFS settings are configured on one specific server of the solution, distributing the settings to all other servers of the solution. For example, after setup it mounts the share automatically on all of them and adjusts the /etc/fstab file on all 19 CentOS servers.

The problem is intermittently, the mounted share disappears on some of the 19 servers. I don't know why.

However, I am not able to remount it manually with the mount command. Additionally, there are issues like df -h not responding, CD into the mount point resulting in the SSH session hang, or I see packets with checksum errors via tcpdump, when manually mounting the share again.

A reboot of the server clears the problem and the share is automatically mounted again.

I would like to configure the NFS stuff more resilient.

What I found out so far:

  • df -h hangs, but can be terminated via CTRL+C
  • cd into mount point freezes SSH session
  • umount -f MOUNTPOINT returns busy device
  • umount -l MOUNTPOINT works
  • manual mount via mount -t nfs IP:SHARE MOUNTPOINT doesn't work resp. runs infinite
  • mount | grep nfs > only sunrpc on ...
  • nfsstat -m > returned nothing
  • uname -r > 3.10.0-1160.95.1.e17.x86_64
  • ps aux | grep " D " > root PID 0.0 0.0 0 0 ? D Aug13 0:00 [NFS_SERVER_IP-man]
  • Vendor doesn't know what's going on
  • cat /etc/fstab > NFS_IP:SHARE MOUNTPOINT nfs defaults 0 0
  • ps -e | grep nfs > nfsiod and nfsv4.1-svc

Based on my limited understanding, the kernel thread is stuck and cannot be recovered on this old OS. Additionally, because of network issues the NFS connection couldn't be reestablished automatically, but that can be changed with specific settings in /etc/fstab (e.g. soft).

Would appreciate any help to solve the problem and technical hints, I can use to communicate the problem internally and with the vendor.

5
  • Are you talking about a reboot of the NFS server, or about a reboot of the servers that are NFS clients? Commented Nov 14 at 9:50
  • Hi, thanks for your comments. 1) Posted it already on Stack Overflow, but there it was closed immediately. I know, CentOS is out of support, but unfortunately we still have to use it. 2) sorry, I wasn't precise enough. I am talking about rebooting the NFS clients (the VMs of the solution, connecting to the NFS server) Commented Nov 14 at 10:14
  • Unless I missed it, you haven't mentioned contents of log files Commented Nov 14 at 11:27
  • 3
    "the mounted share disappears" - what does this mean? Commented Nov 14 at 17:06
  • that can be changed with specific settings in /etc/fstab (e.g. soft). In general, that is a really bad idea. There's waaaay too much code that doesn't handle IO issues gracefully, and any such code will corrupt data if an NFS operation fails because of the soft option. And if you're running executables or any of the apps use mmap() to read data from the NFS filesystem, IO failures will cause corrupted reads. Commented Nov 15 at 15:38

0

You must log in to answer this question.