What should I expect to see if md/linux RAID is properly compensating for a failing drive?

Question

Does the md subsystem output any messages (to syslog/systemd-journal) to indicate that it's running in a degraded state (or anything else that might indicate that it has successfully reacted to a drive failure, as hinted at here)?

For example, I see lots of errors from sd indicating things like Unrecovered read error but I don't see anything like "retried successfully on alternate". Maybe no news is good news?

Back in the day, mirroring software/hardware would generate syslog entries that indicated when a device was degraded or otherwise required attention. Does md not do that?

Background: the systems in question are already deployed and are being remotely monitored (via syslog/journald info, so no mdadm or any other interactive commands/access of any sort are available at this point).

Previously posted (and closed as "off topic") at serverfault. — jhfrontz
– jhfrontz, Commented Oct 6, 2019 at 17:22
@roaima please see "no interactive commands/access of any sort are available". — jhfrontz
– jhfrontz, Commented Oct 6, 2019 at 17:23
I see that. I still say that it's an essential starting point, but not as an answer because of your restrictions. (I'm looking to see what other options you've got.) — Chris Davies
– Chris Davies, Commented Oct 6, 2019 at 17:24
Without interactive access, how to you expect to recover from a failure? If a disk fails and is failed out of the array, then when you replace it you need to tell the md driver to add the replacement disk. — Stephen Harris
– Stephen Harris, Commented Oct 6, 2019 at 17:28

Chris Davies · Accepted Answer · 2019-10-08 07:57:05Z

I set up a quick test on a RAID 1 array built from two loop devices.

dd bs=1M count=100 if=/dev/zero >/tmp/0.img cp /tmp/0.img /tmp/1.img i0=$(losetup --show --find /tmp/0.img); echo $i0 i1=$(losetup --show --find /tmp/1.img); echo $i1 mdadm --create /dev/md99 --metadata default --level 1 --raid-devices 2 $i0 $i1

Setting one half faulty

mdadm --manage /dev/md99 --set-faulty $i1 # For me, $i1=/dev/loop1

gives me this from the kernel (amongst other related RAID1 messages)

Oct 6 17:36:10 pi kernel: [4087450.030438] md/raid1:md99: Disk failure on loop1, disabling device Oct 6 17:36:10 pi kernel: [4087450.030438] md/raid1:md99: Operation continuing on 1 devices.

Stack Exchange Network

What should I expect to see if md/linux RAID is properly compensating for a failing drive?

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

What should I expect to see if md/linux RAID is properly compensating for a failing drive?

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions