Tuesday, July 21, 2009

On ZFS and terror

I've had my main backup / nameless cruft repository on 3 x 1TB drives in a ZFS RAID-Z array for a while now, and finally found out what happens ZFS (or Linux ZFS-FUSE, at least) encounters bad blocks.

It loses it, completely.

I suspect this is largely a Linux problem: once a bad block is encountered, the Linux kernel (?) seems to go into an infinite-reread tailspin, flooding syslog with errors and generally making the SATA bus (and the machine at large) unusable. So it might not be ZFS' fault.

And a quick aside to the many posts claiming that "the OS should never see a bad block; the drive should silently remap the block to a spare and the OS will only be aware of it when you've run out of spares!": bullshit. I've had half a dozen drives turn up w/ bad blocks, and the SMART stats on each reported plenty of spares, and each drive was fine after a forced overwrite. (IE, I zero'ed out the drive.)

So I repeated my zero-out procedure on this drive, and the sequence of events gets fuzzy, but at some point I backed up /etc and at some later point I ran zpool export on the DEGRADED array. When the time came to put the newly-zero'ed drive back in the array, zpool wouldn't bring it back up: zpool import reported that "The pool can be imported despite missing or damaged devices," but zpool import tank ("tank" being the customary name for ZFS pools, I think) complained that it "cannot import 'tank': one or more devices is currently unavailable." zpool import -f tank yielded the same thing.

This when when I began to panic.

Despite having 2 fully functional drives, all my data would have been lost. But remember that backup of /etc, notably including /etc/zfs/zpool.cache? That was my salvation: restoring that brought me back to my pre-export, degraded state, which in turn let me replace the faulty drive.

My data is safe for now, but my confidence in ZFS is shaken. Why did exporting a degraded array make it un-importable? If I lose my zpool.cache, is all lost?

No comments: