How to keep the kernel from accessing the journal on an ext4 partition?

Question

I hope that this is not a duplicate question. I have seen several similar questions, where the answer was to blacklist the respective device or partition. But in my case, I can't do that (see below). Having said this:

On a debian buster x64 host, I have created a VM (based on QEMU). The VM runs on a block device partition, let's say /dev/sdc1. I have installed the debian system on that partition basically like that (some steps omitted):

#> mkfs.ext4 -j /dev/sdc1 #> mount /dev/sdc1 /mnt/target #> debootstrap ... bullseye /mnt/target

Then I bind-mounted the necessary directories (/dev, /sys etc.), chrooted into /mnt/target, completed the guest OS installation and booted the VM.

The VM first started without issues. But with every VM reboot, the VM got more problems, which I was repairing at the GRUB and initramfs prompts, until repairing was not possible any more because obviously the ext4 file system had been damaged.

Because I originally thought that I had done something wrong, e.g. forgot to unmount the ext4 partition before starting the VM, I repeated the whole installation from scratch multiple times. The result was the same in every case: After a few restarts, the ext4 file system was so damaged that I couldn't repair it.

Accidentally, I have found the reason for this, but have no idea how to solve the problem. I noticed that e2fsck refused to operate on that partition, claiming that is was in use although it was not mounted and the VM was not running. Further investigation showed that there existed a kernel thread jbd2/sdc.

That means that the host kernel accesses the journal on that partition / file system. When I start the VM, the guest kernel of course does the same. I am nearly sure that the corruption of the file system is due to both kernels accessing the file system, notably the journal, at the same time.

How can I solve the problem?

I cannot blacklist the respective disk or the respective partition on the host, because I need to mount them there to prepare or complete the guest OS installation in a chroot. On the other hand, it doesn't seem possible to tell the host kernel to release the journal as soon as the VM starts.

I have installed a lot of VMs in the past years exactly the same way, but did not turn on the journal when creating their ext4 file system. Consequently, I didn't have that issue with those VMs.

Edit 1

In case it is relevant, when mounting the partition and chrooting into it to complete the guest OS installation, I use the following commands:

cd /mnt mkdir target mount /dev/sdc1 target mount --rbind /dev target/dev mount --make-rslave target/dev mount --rbind /proc target/proc mount --make-rslave target/proc mount --rbind /sys target/sys mount --make-rslave target/sys LANG=C.UTF-8 chroot target /bin/bash --login

When unmounting, I just do

umount -R target

The umount command does not report any error.

@steve Thanks for your suggestions. The first one looks promising. However, I am still asking myself why the host kernel grabs that journal even after having dismounted the partition. I don't want to dump the journal completely, though. I'll try your first suggestion and report back. — Binarus
– Binarus, Commented Jun 22, 2022 at 7:05
Is there a reason why --rbind and --make-rslave and double calls of mount for each mountpoint are necessary instead of single mount calls per mountpoint with a simple -B? That might be causing the issue. — Vilinkameni
– Vilinkameni, Commented Jun 22, 2022 at 9:05
Also the comments in this question seem to suggest this might be a kernel bug or related to mount namespaces. What does the /proc/self/mountinfo show? — Vilinkameni
– Vilinkameni, Commented Jun 22, 2022 at 9:20
@steve Your first suggestion seems to have mitigated the problem. Thank you very much! When I consequently use -o norecovery, the host kernel does not put its hands on the ext4 partition's journal, and there are no jbd2/sdc entries any more in the output of lsof. If you make your comment an answer, I'll accept it. Besides that, I guess that the debian kernel is buggy: I still even can't e2fsck that partitions as soon as I have mounted and unmounted it, but at least it doesn't damage the file system any more. — Binarus
– Binarus, Commented Jun 22, 2022 at 10:30

steve · Accepted Answer · 2022-06-22 13:00:28Z

By passing -o norecovery to mount, you could mount the filesystem without making use of the journal at all.

Man page for mount, ext3 section:

norecovery/noload

Don't load the journal on mounting. Note that if the filesystem was not unmounted cleanly, skipping the journal replay will lead to the filesystem containing inconsistencies that can lead to any number of problems.

OK, thanks, that solved the problem, accepted and +1.

Binarus
– Binarus

2022-06-22 15:56:58 +00:00
Commented Jun 22, 2022 at 15:56 — Binarus
– Binarus, Commented Jun 22, 2022 at 15:56

Stack Exchange Network

How to keep the kernel from accessing the journal on an ext4 partition?

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

How to keep the kernel from accessing the journal on an ext4 partition?

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions