RAID? Was: PATA hard disks, anyone?
Fred Cisin
cisin at xenosoft.com
Wed Mar 28 18:43:43 CDT 2018
On Wed, 28 Mar 2018, Richard Pope via cctalk wrote:
> I have been kind of following this thread. I have a question about MTBF.
> I have four HGST UltraStar Enterprise 2TB drives setup in a Hardware RAID 10
> configuration. If the the MTBF is 100,000 Hrs for each drive does this mean
> that the total MTBF is 25,000 Hrs?
<pedantic sadistics>
Probably NOT.
It depends extremely heavily on the shape of the curve of failure times.
MEAN Time Before Failure, of course, means that for a large enough sample,
half the drives fail before 100,000 hours, and half after. Thus, at
100,000 hours, half are dead.
But, how evenly distributed are the failures?
Besides the MTBF, it would help to know the variance or standard
deviation.
It is unlikely that the failures follow a "normal distribution"
(or "Laplace-Gauss") bell curve. And, other distributions are
certainly not ABnormal :-)
If the curve is symmetrical, then the mean, median, and mode will all be
the same. If it is not symmetrical, then they won't be. Hence the use of
MEDIAN - at that point half are dead, half are still alive.
In toxicology, there is a concept of an LD-50 dosage - the dosage that
will kill half, since for example, antibiotic resistant bacteria might
require an incredibly large dosage to get that last one, but LD-50
provides a convenient way to get a single number.
100,000 hours is the LD-50 of those drives.
If it turns out that the drives last 100,000 hours, plus or minus 10%,
then you have a curve with a very steep slope. It is still half dead at
100,000, but maybe hardly any dead until 90,000, hardly any left alive at
110,000.
OTOH, if the failures were evenly distributed throughout a life of 0 to
200,000 hours, with the same number going every day, then that also would
have a MTBF of 100,000. In THAT case, then yes, the MTBF of first
failure may well be 25,000.
They rarely work that way. Often our devices will have what is sometimes
called a "bathtub curve". There are a few failures IMMEDIATELY ("infant
mortality") falling off rapidly, and then few failures for quite a while,
and then, as random parts start to wear out, the failures rise.
In fact, with the same MTBF of 100,000, it could be that once the early
demise ones are discarded, that the MTBF of the REMAINDER might be
200,000.
IFF you are willing to deal with the DOA and infant mortality cases, then
by discarding or ignoring those outlying numbers, you might get a more
realistic evaluation of what to expect.
</pedantic sadistics>
--
Grumpy Ol' Fred cisin at xenosoft.com
More information about the cctalk
mailing list