On Wed, Mar 28, 2018 at 09:33:38AM -0400, Paul Koning via cctalk wrote:
[...]
The basic assumption is that failures are "fail
stop", i.e., a drive refuses
to deliver data. (In particular, it doesn't lie -- deliver wrong data. You
can build systems that deal with lying drives but RAID is not such a system.)
The failure may be the whole drive ("it's a door-stop") or individual
blocks
(hard read errors).
The assumption that disks don't lie is demonstrably false, and anybody who
still designs or sells a system in 2018 which makes that assumption is a
charlatan. I have hardware which proves it.
Sun's ZFS filesystem applies an extra "trust but verify" layer of
protection
using strong checksums. I have a server with a pair of mirrored 3TB enterprise
disks which are "zfs scrub"bed (surface-scanned and checksums verified) weekly.
Every few months, the scrub will hit a bad checksum which shows that the disk
read back different data to that which was written, even though the disk
claimed the read was OK. At best (and most likely) the problem was a single bit
flip, i.e. roughly a 1 in 1.8e13 error rate. So much for the manufacturer's
claim of less than 1 in 1e15 for that model of disk.
A workstation with a pair of 512GB consumer-grade SSDs has a half-dozen bad
stripes in every scrub performed after the machine has been powered down for a
week or so. The SSDs have just a few hundred hours on the clock and perhaps
three full drive writes. I love the performance of SSDs, but they are
appallingly unreliable for even medium-term storage.
Fortunately, ZFS can tell from the checksums which half of the mirror is lying,
and thus rewrite the stripe based on the known-good copy. It even handles the
case where both disks have some errors. Traditional RAID just cannot self-heal
like that.