A friend of mine has a mdadm-raid5 with 9 disks which does not reassemble anymore.
After having a look at the syslog I found that the disk sdi was kicked from the array:
Jul 6 08:43:25 nasty kernel: [ 12.952194] md: bind<sdc> Jul 6 08:43:25 nasty kernel: [ 12.952577] md: bind<sdd> Jul 6 08:43:25 nasty kernel: [ 12.952683] md: bind<sde> Jul 6 08:43:25 nasty kernel: [ 12.952784] md: bind<sdf> Jul 6 08:43:25 nasty kernel: [ 12.952885] md: bind<sdg> Jul 6 08:43:25 nasty kernel: [ 12.952981] md: bind<sdh> Jul 6 08:43:25 nasty kernel: [ 12.953078] md: bind<sdi> Jul 6 08:43:25 nasty kernel: [ 12.953169] md: bind<sdj> Jul 6 08:43:25 nasty kernel: [ 12.953288] md: bind<sda> Jul 6 08:43:25 nasty kernel: [ 12.953308] md: kicking non-fresh sdi from array! Jul 6 08:43:25 nasty kernel: [ 12.953314] md: unbind<sdi> Jul 6 08:43:25 nasty kernel: [ 12.960603] md: export_rdev(sdi) Jul 6 08:43:25 nasty kernel: [ 12.969675] raid5: device sda operational as raid disk 0 Jul 6 08:43:25 nasty kernel: [ 12.969679] raid5: device sdj operational as raid disk 8 Jul 6 08:43:25 nasty kernel: [ 12.969682] raid5: device sdh operational as raid disk 6 Jul 6 08:43:25 nasty kernel: [ 12.969684] raid5: device sdg operational as raid disk 5 Jul 6 08:43:25 nasty kernel: [ 12.969687] raid5: device sdf operational as raid disk 4 Jul 6 08:43:25 nasty kernel: [ 12.969689] raid5: device sde operational as raid disk 3 Jul 6 08:43:25 nasty kernel: [ 12.969692] raid5: device sdd operational as raid disk 2 Jul 6 08:43:25 nasty kernel: [ 12.969694] raid5: device sdc operational as raid disk 1 Jul 6 08:43:25 nasty kernel: [ 12.970536] raid5: allocated 9542kB for md127 Jul 6 08:43:25 nasty kernel: [ 12.973975] 0: w=1 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 6 08:43:25 nasty kernel: [ 12.973980] 8: w=2 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 6 08:43:25 nasty kernel: [ 12.973983] 6: w=3 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 6 08:43:25 nasty kernel: [ 12.973986] 5: w=4 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 6 08:43:25 nasty kernel: [ 12.973989] 4: w=5 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 6 08:43:25 nasty kernel: [ 12.973992] 3: w=6 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 6 08:43:25 nasty kernel: [ 12.973996] 2: w=7 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 6 08:43:25 nasty kernel: [ 12.973999] 1: w=8 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 6 08:43:25 nasty kernel: [ 12.974002] raid5: raid level 5 set md127 active with 8 out of 9 devices, algorithm 2 Unfortunately this wasn't recognized and now another drive was kicked (sde):
Jul 14 08:02:45 nasty kernel: [ 12.918556] md: bind<sdc> Jul 14 08:02:45 nasty kernel: [ 12.919043] md: bind<sdd> Jul 14 08:02:45 nasty kernel: [ 12.919158] md: bind<sde> Jul 14 08:02:45 nasty kernel: [ 12.919260] md: bind<sdf> Jul 14 08:02:45 nasty kernel: [ 12.919361] md: bind<sdg> Jul 14 08:02:45 nasty kernel: [ 12.919461] md: bind<sdh> Jul 14 08:02:45 nasty kernel: [ 12.919556] md: bind<sdi> Jul 14 08:02:45 nasty kernel: [ 12.919641] md: bind<sdj> Jul 14 08:02:45 nasty kernel: [ 12.919756] md: bind<sda> Jul 14 08:02:45 nasty kernel: [ 12.919775] md: kicking non-fresh sdi from array! Jul 14 08:02:45 nasty kernel: [ 12.919781] md: unbind<sdi> Jul 14 08:02:45 nasty kernel: [ 12.928177] md: export_rdev(sdi) Jul 14 08:02:45 nasty kernel: [ 12.928187] md: kicking non-fresh sde from array! Jul 14 08:02:45 nasty kernel: [ 12.928198] md: unbind<sde> Jul 14 08:02:45 nasty kernel: [ 12.936064] md: export_rdev(sde) Jul 14 08:02:45 nasty kernel: [ 12.943900] raid5: device sda operational as raid disk 0 Jul 14 08:02:45 nasty kernel: [ 12.943904] raid5: device sdj operational as raid disk 8 Jul 14 08:02:45 nasty kernel: [ 12.943907] raid5: device sdh operational as raid disk 6 Jul 14 08:02:45 nasty kernel: [ 12.943909] raid5: device sdg operational as raid disk 5 Jul 14 08:02:45 nasty kernel: [ 12.943911] raid5: device sdf operational as raid disk 4 Jul 14 08:02:45 nasty kernel: [ 12.943914] raid5: device sdd operational as raid disk 2 Jul 14 08:02:45 nasty kernel: [ 12.943916] raid5: device sdc operational as raid disk 1 Jul 14 08:02:45 nasty kernel: [ 12.944776] raid5: allocated 9542kB for md127 Jul 14 08:02:45 nasty kernel: [ 12.944861] 0: w=1 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 14 08:02:45 nasty kernel: [ 12.944864] 8: w=2 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 14 08:02:45 nasty kernel: [ 12.944867] 6: w=3 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 14 08:02:45 nasty kernel: [ 12.944871] 5: w=4 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 14 08:02:45 nasty kernel: [ 12.944874] 4: w=5 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 14 08:02:45 nasty kernel: [ 12.944877] 2: w=6 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 14 08:02:45 nasty kernel: [ 12.944879] 1: w=7 pa=0 pr=9 m=1 a=2 r=9 op1=0 op2=0 Jul 14 08:02:45 nasty kernel: [ 12.944882] raid5: not enough operational devices for md127 (2/9 failed) And now the array does not start anymore. However it seems that every disk contains the raid metadata:
/dev/sda: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9 Name : nasty:stuff (local to host nasty) Creation Time : Sun Mar 16 02:37:47 2014 Raid Level : raid5 Raid Devices : 9 Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB) Array Size : 62512275456 (29808.18 GiB 32006.29 GB) Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 8600bda9:18845be8:02187ecc:1bfad83a Update Time : Mon Jul 14 00:45:35 2014 Checksum : e38d46e8 - correct Events : 123132 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 0 Array State : AAA.AAA.A ('A' == active, '.' == missing) /dev/sdc: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9 Name : nasty:stuff (local to host nasty) Creation Time : Sun Mar 16 02:37:47 2014 Raid Level : raid5 Raid Devices : 9 Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB) Array Size : 62512275456 (29808.18 GiB 32006.29 GB) Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : fe612c05:f7a45b0a:e28feafe:891b2bda Update Time : Mon Jul 14 00:45:35 2014 Checksum : 32bb628e - correct Events : 123132 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 1 Array State : AAA.AAA.A ('A' == active, '.' == missing) /dev/sdd: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9 Name : nasty:stuff (local to host nasty) Creation Time : Sun Mar 16 02:37:47 2014 Raid Level : raid5 Raid Devices : 9 Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB) Array Size : 62512275456 (29808.18 GiB 32006.29 GB) Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 1d14616c:d30cadc7:6d042bb3:0d7f6631 Update Time : Mon Jul 14 00:45:35 2014 Checksum : 62bd5499 - correct Events : 123132 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : AAA.AAA.A ('A' == active, '.' == missing) /dev/sde: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9 Name : nasty:stuff (local to host nasty) Creation Time : Sun Mar 16 02:37:47 2014 Raid Level : raid5 Raid Devices : 9 Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB) Array Size : 62512275456 (29808.18 GiB 32006.29 GB) Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : a2babca3:1283654a:ef8075b5:aaf5d209 Update Time : Mon Jul 14 00:45:07 2014 Checksum : f78d6456 - correct Events : 123123 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 3 Array State : AAAAAAA.A ('A' == active, '.' == missing) /dev/sdf: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9 Name : nasty:stuff (local to host nasty) Creation Time : Sun Mar 16 02:37:47 2014 Raid Level : raid5 Raid Devices : 9 Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB) Array Size : 62512275456 (29808.18 GiB 32006.29 GB) Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : e67d566d:92aaafb4:24f5f16e:5ceb0db7 Update Time : Mon Jul 14 00:45:35 2014 Checksum : 9223b929 - correct Events : 123132 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 4 Array State : AAA.AAA.A ('A' == active, '.' == missing) /dev/sdg: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9 Name : nasty:stuff (local to host nasty) Creation Time : Sun Mar 16 02:37:47 2014 Raid Level : raid5 Raid Devices : 9 Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB) Array Size : 62512275456 (29808.18 GiB 32006.29 GB) Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 2cee1d71:16c27acc:43e80d02:1da74eeb Update Time : Mon Jul 14 00:45:35 2014 Checksum : 7512efd4 - correct Events : 123132 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 5 Array State : AAA.AAA.A ('A' == active, '.' == missing) /dev/sdh: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9 Name : nasty:stuff (local to host nasty) Creation Time : Sun Mar 16 02:37:47 2014 Raid Level : raid5 Raid Devices : 9 Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB) Array Size : 62512275456 (29808.18 GiB 32006.29 GB) Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : c239f0ad:336cdb88:62c5ff46:c36ea5f8 Update Time : Mon Jul 14 00:45:35 2014 Checksum : c08e8a4d - correct Events : 123132 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 6 Array State : AAA.AAA.A ('A' == active, '.' == missing) /dev/sdi: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9 Name : nasty:stuff (local to host nasty) Creation Time : Sun Mar 16 02:37:47 2014 Raid Level : raid5 Raid Devices : 9 Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB) Array Size : 62512275456 (29808.18 GiB 32006.29 GB) Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : d06c58f8:370a0535:b7e51073:f121f58c Update Time : Mon Jul 14 00:45:07 2014 Checksum : 77844dcc - correct Events : 0 Layout : left-symmetric Chunk Size : 512K Device Role : spare Array State : AAAAAAA.A ('A' == active, '.' == missing) /dev/sdj: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9 Name : nasty:stuff (local to host nasty) Creation Time : Sun Mar 16 02:37:47 2014 Raid Level : raid5 Raid Devices : 9 Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB) Array Size : 62512275456 (29808.18 GiB 32006.29 GB) Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : f2de262f:49d17fea:b9a475c1:b0cad0b7 Update Time : Mon Jul 14 00:45:35 2014 Checksum : dd0acfd9 - correct Events : 123132 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 8 Array State : AAA.AAA.A ('A' == active, '.' == missing) But as you can see the two drives (sde, sdi) are in active state (but raid is stopped) and sdi is a spare. While sde has a slightly lower Events-count than most of the other drives (123123 instead of 123132) sdi has an Events-count of 0. So I think sde is almost up-to-date. But sdi not ...
Now we read online that a hard power-off could cause these "kicking non-fresh"-messages. And indeed my friend caused a hard power-off one or two times. So we followed the instructions we found online and tried to re-add sde to the array:
$ mdadm /dev/md127 --add /dev/sde mdadm: add new device failed for /dev/sde as 9: Invalid argument But that failed and now mdadm --examine /dev/sde shows an Events-count of 0 for sde too (+ it's a spare now like sdi):
/dev/sde: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : b8a04dbb:0b5dffda:601eb40d:d2dc37c9 Name : nasty:stuff (local to host nasty) Creation Time : Sun Mar 16 02:37:47 2014 Raid Level : raid5 Raid Devices : 9 Avail Dev Size : 7814035120 (3726.02 GiB 4000.79 GB) Array Size : 62512275456 (29808.18 GiB 32006.29 GB) Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 689e0030:142122ae:7ab37935:c80ab400 Update Time : Mon Jul 14 00:45:35 2014 Checksum : 5e6c4cf7 - correct Events : 0 Layout : left-symmetric Chunk Size : 512K Device Role : spare Array State : AAA.AAA.A ('A' == active, '.' == missing) We know that 2 failed drives usually means the death for a raid5. However is there a way to add at least sde to the raid so that data can be saved?