Skip to main content
3 of 3
edited title
barrymac
  • 1.2k
  • 2
  • 12
  • 19

Can zfs resilvering destroy data with misreported device failures?

I have had a situation where I am moving data into a new zfs raidz pool with four devices, some of them virtual to facilitate the migration. The system completely hung in the middle of a device replace of a file based device to a physical device.

The system did not even respond to SysRq and had to be reset physically. When it came back online then zfs had decided that only 2 out of 4 devices were online and started resilvering and reporting loads of errors. I didn't know how to stop it doing this, it keeps going in the backround even when the pool is unmounted.

By the time I managed to get the totally ok missing device online it has reported many many errors.

Does that mean that zfs has destroyed data while resilvering due to the missing device? Or can it now resilver correctly back again now that it has it's original devices in place?

When it was resilvering with only 2 devices then it was resilvering on sda3 below:

 NAME STATE READ WRITE CKSUM zfs_raid DEGRADED 0 0 38.5K raidz1-0 DEGRADED 0 0 129K sda3 ONLINE 0 0 0 sdc2 ONLINE 0 0 0 replacing-2 DEGRADED 0 0 3 /zfs_jbod2/zfs_raid/zfs.1 OFFLINE 0 0 0 sdb1 ONLINE 0 0 0 (resilvering) /zfs_jbod/zfs_raid/zfs.2 ONLINE 0 0 0 (resilvering) 

errors: 25852 data errors, use '-v' for a list

barrymac
  • 1.2k
  • 2
  • 12
  • 19