Horrible situation - file systems mounted simultaneously by multiple independent OS instances

Question

How do I get out of this situation safely?

Details are as follow:

A xen server has got block devices allocated to VMs. But these devices have also been mounted inside Xen.

In fact 44 of these block devices have been mounted like this. To make matters worse, each physical device is seen over 4 paths and each of those are mounted on a separate mountpoint. In other words the devices are actually mounted 5 times each.

The VM guest OS sees the path via a PowerPath pseudo device (allocated as a phy: block device to the domU)

Some of the devices are formatted as ext2 and reiserfs.

No need to explain to me the file system corruption risks involved here.

I am afraid that even just unmounting the file systems may cause corruption, and feel that at this point pulling the power from the host, is the safest option.

Note that the applications, Oracle databases for the most part, in all the VMs are still running and in use.

I discovered this when investigating high CPU usage on the dom0. There is an unkillable "find" process, with cwd -> /media/disk-12 which is mounted from /dev/sdf1, which belongs to /dev/emcpowerr

Before anybody asks, the one time I've seen processes cannot be killed and continue to use CPU and RAM (unlike a defunct/zombie process), is when there is outstanding commited I/Os, eg sync returned but not physically on disk yet. More commonly this occurs on tape I/O.

Suggestions!?

P.S. I would have expected devices to be "reserved" once mounted, to prevent this kind of thing? Or is that not possible on Linux?

EDIT: Firstly I am convinced that KDE within the hypervisor) is the culprit. It looks like KDE is mounting the devices it can on logging to create desktop icons. The same thing is however not happening on other Xen servers, but all the other servers are running a much older version of SLES and KDE ... V4 appears to be the offending one, with 3.4 behaving better).

Furthermore two non-critical VMs have become hung. After shutting them down they would not boot up again due to file system corruption. The main/production VM is still running and the database on it still working, but clearly this is a time bomb. The customer is attempting to re-build the environment on another VM on another server but is stuck on issues configuring some of the components, so we are waiting...

In any case I feel that none of the answers have so far been more than "best practice is always shut down gracefully" And I hope to get something more concrete... In any case, I feel that this situation may warrant some more careful thinking. Will shutting down cause outstanding IO, in particular file system meta data updates from the hypervisor, to be synced and cause potentially major file system corruption?

And right now any backups taken before "shutting down" may possibly simply backup corrupted data, though in this situation it is more likely that file system meta-data is corrupted, rather than file contents. — Johan
– Johan, Commented Mar 5, 2013 at 16:39
I'm afraid you are going to lose at least some of the data in any case. Turning the host off physically or terminating the VMs forcefully might have the unwanted consequence of messing up everything (i.e. even those file systems that are only mounted once). I would probably try to terminate everything as cleanly as possible to minimise the losses. And of course making sure it doesn't happen again. — peterph
– peterph, Commented Mar 5, 2013 at 20:59
As for preventing it, IIUC you might try to set permissions on the device in dom0 once it is opened by the guest, but since fs permissions (on the device files) can be crossed by root (unless you have a patched kernel) it might not need to help. — peterph
– peterph, Commented Mar 5, 2013 at 21:36
Regarding your post script: if the devices are visible through multiple paths then the kernel probably doesn't even know that they are all the same device, so how could it "reserve" it? As for exporting a device from dom0 to multiple domUs, it lets you do that because you might actually want to do it on purpose (e.g. with a filesystem that supports it, or mounted read-only everywhere). — Celada
– Celada, Commented Mar 5, 2013 at 22:06
@Celada I thought aboust that, but there are ways of "locking" devices: PowerPath should (does in the case of Solaris) reserve all the parent-paths of a device (At the time it initializes). Additionally SCSI "reserve" commands are managed by the target device, so once a target is reserved, it should refuse to allow a reserve against any of the paths for that device. At least that is my limited understanding. — Johan
– Johan, Commented Mar 6, 2013 at 4:19

Alien Life Form · Accepted Answer · 2013-03-19 09:20:06Z

If the disks are being written from a single mount point no harm is being done. Do a clean shutdown, (back it up from suspended state if you will) fix the mounts. Do not run anything but the bare needed apps on the Dom0. If, OTOH, partitions are being written from multiple paths, that's BAD and getting worse by the second. Pull the plug.

Johan · Accepted Answer · 2013-03-14 09:05:54Z

I have no concrete reason but my gut-feeling tells me that the following may be the best approach:

Shut down applications.
Copy all data from the VM via the network to a backup location.
Un-mount the file systems from within the VM.
Shut down the VM. (There is only one VM running on this host now).
Ensure no domUs are set to start automatically.
Pull out the power on the host to prevent the hypervisor from performing any "closing" actions, sync of outstanding I/O, etc.
Boot up the VM, hoping that the hypervisor itself survived the power-yank.
If it fails, re-build the environment. (The VMs boot disks are file based, but data mount points reside on external disk allocated as block devices)
Check if the hypervisor is mounting any file systems belonging to the domUs. Un-mount these before any domUs are started)
Turn KDE auto-mounting off.
Start-up the VM and force a full FS check.

Alternative to 11: Start-up the VM and mount the file systems without a full fsck.

The reasoning is that I do not want the Xen hypervisor to have any more chance that absolutely necessary to cause corruption on the domU file systems.

Huygens · Accepted Answer · 2013-03-14 09:21:46Z

I am no Xen expert and had no experience with it yet. But my approach if I was in your place would be: first I know I might lose data (maybe even all); second I would try to create snapshots and then suspending the VMs, restoring them in safe different environment.
I do not want to give you false hopes, but I think you will be lucky if you can recover anything.

Warning: following these advices could make you lose all data. This is up to you to see if it is worth the risk or not.

With a lot of luck, your applications are still working because the data they are using is all in volatile memory. You should try to get advantage of this situation (try to evaluate if that could be the case on a per apps basis) and export the live data to a network share if the applications offer such a feature. If any data is on disk, this export function could either be "locked" much like your find statement or crash (and crash the application or OS) because of the changed/corrupted disk data.

Then you could try to do a live snapshot, the instructions in the following article: Creating snapshots in Xen. I would go for the byte-by-byte snapshot, although it could get stuck much like your find command... However, I would not give this much hope.

Before doing the previous command, you ought to read this document from Citrix which helps understanding snapshots in Xen (PDF).

I wish you good luck.

Thank you. The customer do have an export of the database. I think they just used FTP to get it off the VM, but it is possible to mount a network share and export directly to that. — Johan
– Johan, Commented Mar 15, 2013 at 6:57
I have been toying with the idea of suspending the VM and then taking a full copy over to another host and then try to a) Resume it from the sleep, or b) boot it up, followed by a reboot and fsck. The idea is that since I still have the suspended VM on the original host, I may be able to resume that one if the copy doesn't work on the other host. — Johan
– Johan, Commented Mar 15, 2013 at 7:01
Also FWIW the problem with going back to a backup is that it is feared that all the backups taken for the last couple of months are corrupt. — Johan
– Johan, Commented Mar 15, 2013 at 7:18
@Johan this is more than probably true, most if not all backup (since the problem occurred) are probably corrupted. The same might be true for the database export. Good luck again, you will need it! — Huygens
– Huygens, Commented Mar 15, 2013 at 7:57

Stack Exchange Network

Horrible situation - file systems mounted simultaneously by multiple independent OS instances

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Horrible situation - file systems mounted simultaneously by multiple independent OS instances

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions