2

Does the md subsystem output any messages (to syslog/systemd-journal) to indicate that it's running in a degraded state (or anything else that might indicate that it has successfully reacted to a drive failure, as hinted at here)?

For example, I see lots of errors from sd indicating things like Unrecovered read error but I don't see anything like "retried successfully on alternate". Maybe no news is good news?

Back in the day, mirroring software/hardware would generate syslog entries that indicated when a device was degraded or otherwise required attention. Does md not do that?

Background: the systems in question are already deployed and are being remotely monitored (via syslog/journald info, so no mdadm or any other interactive commands/access of any sort are available at this point).

7
  • 1
    cat /proc/mdstat is an essential starting point Commented Oct 6, 2019 at 17:22
  • Previously posted (and closed as "off topic") at serverfault. Commented Oct 6, 2019 at 17:22
  • @roaima please see "no interactive commands/access of any sort are available". Commented Oct 6, 2019 at 17:23
  • I see that. I still say that it's an essential starting point, but not as an answer because of your restrictions. (I'm looking to see what other options you've got.) Commented Oct 6, 2019 at 17:24
  • 1
    Without interactive access, how to you expect to recover from a failure? If a disk fails and is failed out of the array, then when you replace it you need to tell the md driver to add the replacement disk. Commented Oct 6, 2019 at 17:28

1 Answer 1

2

I set up a quick test on a RAID 1 array built from two loop devices.

dd bs=1M count=100 if=/dev/zero >/tmp/0.img cp /tmp/0.img /tmp/1.img i0=$(losetup --show --find /tmp/0.img); echo $i0 i1=$(losetup --show --find /tmp/1.img); echo $i1 mdadm --create /dev/md99 --metadata default --level 1 --raid-devices 2 $i0 $i1 

Setting one half faulty

mdadm --manage /dev/md99 --set-faulty $i1 # For me, $i1=/dev/loop1 

gives me this from the kernel (amongst other related RAID1 messages)

Oct 6 17:36:10 pi kernel: [4087450.030438] md/raid1:md99: Disk failure on loop1, disabling device Oct 6 17:36:10 pi kernel: [4087450.030438] md/raid1:md99: Operation continuing on 1 devices. 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.