RAID? Was: PATA hard disks, anyone?

Richard Pope mechanic_2 at charter.net
Wed Mar 28 20:57:58 CDT 2018


Fred,
     I appreciate the explanation. So with out a 1,000, 10,000, or even 
100,000 drives there is no way to know how long my drives in the RAID 
will last. All I know for sure is that I can lose anyone drive and the 
RAID can be rebuilt.
GOD Bless and Thanks,
rich!

On 3/28/2018 4:43 PM, Fred Cisin via cctalk wrote:
> On Wed, 28 Mar 2018, Richard Pope via cctalk wrote:
>>    I have been kind of following this thread. I have a question about 
>> MTBF. I have four HGST UltraStar Enterprise 2TB drives setup in a 
>> Hardware RAID 10 configuration. If the the MTBF is 100,000 Hrs for 
>> each drive does this mean that the total MTBF is 25,000 Hrs?
>
> <pedantic sadistics>
> Probably NOT.
> It depends extremely heavily on the shape of the curve of failure times.
> MEAN Time Before Failure, of course, means that for a large enough 
> sample, half the drives fail before 100,000 hours, and half after.  
> Thus, at 100,000 hours, half are dead.
>
> But, how evenly distributed are the failures?
> Besides the MTBF, it would help to know the variance or standard 
> deviation.
> It is unlikely that the failures follow a "normal distribution" (or 
> "Laplace-Gauss") bell curve.  And, other distributions are certainly 
> not ABnormal :-)
>
> If the curve is symmetrical, then the mean, median, and mode will all 
> be the same.  If it is not symmetrical, then they won't be. Hence the 
> use of MEDIAN - at that point half are dead, half are still alive.
> In toxicology, there is a concept of an LD-50 dosage - the dosage that 
> will kill half, since for example, antibiotic resistant bacteria might 
> require an incredibly large dosage to get that last one, but LD-50 
> provides a convenient way to get a single number.
> 100,000 hours is the LD-50 of those drives.
>
>
> If it turns out that the drives last 100,000 hours, plus or minus 10%, 
> then you have a curve with a very steep slope.  It is still half dead 
> at 100,000, but maybe hardly any dead until 90,000, hardly any left 
> alive at 110,000.
>
> OTOH, if the failures were evenly distributed throughout a life of 0 
> to 200,000 hours, with the same number going every day, then that also 
> would have a MTBF of 100,000.   In THAT case, then yes, the MTBF of 
> first failure may well be 25,000.
>
>
> They rarely work that way.  Often our devices will have what is 
> sometimes called a "bathtub curve".  There are a few failures 
> IMMEDIATELY ("infant mortality") falling off rapidly, and then few 
> failures for quite a while, and then, as random parts start to wear 
> out, the failures rise. In fact, with the same MTBF of 100,000, it 
> could be that once the early demise ones are discarded, that the MTBF 
> of the REMAINDER might be 200,000.
>
> IFF you are willing to deal with the DOA and infant mortality cases, 
> then by discarding or ignoring those outlying numbers, you might get a 
> more realistic evaluation of what to expect.
> </pedantic sadistics>
>
> -- 
> Grumpy Ol' Fred             cisin at xenosoft.com
>



More information about the cctalk mailing list