1

My embedded Linux system, which uses a SquashFS as its root filesystem, has unexpectedly stopped booting. The system is designed with an OverlayFS mounted on a temporary RAM disk to provide read-write capabilities while maintaining a reliable, read-only SquashFS base.

The core issue appears to be a corrupted SquashFS root filesystem. My understanding was that even if the OverlayFS experienced an issue (e.g., memory corruption in the RAM disk), it should be impossible for the operating system to modify the underlying SquashFS, as SquashFS is inherently read-only.

Given this setup, what are the possible mechanisms that could lead to a corrupted SquashFS root filesystem? System Details (to the best of my knowledge, more can be provided if necessary):

  • Root Filesystem: SquashFS
  • Read-Write Layer: OverlayFS on a RAM disk (tmpfs)
  • Bootloader: GRUB
  • Storage Medium for SquashFS: NAND
  • Kernel Version: Linux 4.19.155
  • Hardware Platform: Intel x64

What I've already considered (and why I think it's unlikely, but open to correction):

  • OverlayFS issues: I believe problems with the OverlayFS (e.g., corruption of the upperdir or workdir) should only affect the writable layer and not propagate to the read-only lower SquashFS.
  • Normal operation: The system's design is specifically to prevent writes to the SquashFS during normal runtime.

My main question revolves around how a fundamentally read-only filesystem, protected by an OverlayFS, could become corrupted.

Any insights into software failures, hardware failures, or misconfigurations that could lead to this situation would be greatly appreciated.

1 Answer 1

3

Possibilities for the cause of a corrupted squashfs image include:

  • silent failure of the underlying media causing random changes to the squashfs image (including random bit flips or zeroed blocks)
  • external tampering with the storage of the squash image, including attempts to edit it or truncation or update of the image
  • bad ram or bad cables to storage media: this would exhibit failures in booting in different places as the squashfs image in memory may be randomly corrupted differently each time

Likely next steps would include hardware diagnostics and a direct comparison or secure hash comparison of the existing squashfs image to a known good original version.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.