On Wed, 5 Sep 2012, Keith Monahan wrote:
On 9/5/2012 2:51 AM, Tothwolf wrote:
On Tue, 4 Sep 2012, Hollandia at
ccountry.net
wrote:
Will someone name a program that will do
"checkup" on a hard drive,
that could warn of an impending failure?
Thanks,
Kurt
The first person who comes up with a way to reliably predict drive
failure would become an overnight billionaire.
I've just recently had SMART catch an impending failure.
Drives have what are called "pre-failure"
attributes, and if the values of
those attributes exceed the threshold, then the drive is considered to be
failing. The drive manufacturer will (generally) honor the warranty if THEIR
smart utilities confirm the failure.
A 1TB seagate (7200.11) failed with a Reallocated Sector Count(4153!!). It
was also indicating some Offline Uncorrectables. Seagate's utility offered up
a defect verification code (or whatever it is called) and off to seagate.
They replaced it, although their rma process was SLOW SLOW SLOW.
I managed to copy all the data off successfully, but it started making some
physical noises during the copy --- further confirming(to me anyways) that it
was on it's way out.
SMART isn't perfect, and is definitely not a replacement for good backups,
but it's better than nothing.
SMART is better than nothing, although it too isn't all that reliable of a
metric. I've had drives up and die without so much as triggering a SMART
warning and I have drives with a great many reallocated sectors still
plugging away, some with now well over 100,000 power on hours (with the
data they contain backed up, of course).
On the other hand, far too many people think SMART is a _reliable_ way to
predict drive failure and try to depend on it to "warn" them /before/
their drive fails. This /eventually/ leads to disastrous results since
they usually don't bother to back up their files. Of course these tend to
be the same types of people who actually think RAID is a valid alternative
to keeping backups... To wit: multiple drive failure.
One of the key things I've discovered with higher quality drives is that
the lower the number of head retractions (and spin down cycles) the longer
the drive seems to last. I initially discovered this purely by accident,
and this is something you /never/ want to see:
=== START OF READ SMART DATA SECTION ===
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED
RAW_VALUE
[snip]
9 Power_On_Hours 0x0012 053 053 000 Old_age Always -
20827
193 Load_Cycle_Count 0x0012 001 001 000 Old_age Always -
2131639
~2.1 million head load/unload cycles...
2131639 load/unload cycles / 20827 power on hours = 102.35 cycles per hour
102.35 cycles per hour / 60 minutes = 1.71 cycles per minute
More background:
http://tothwolf.livejournal.com/35252.html
Ultimately Linux itself wasn't the cause, the hard drive itself just
defaulted to a very very dumb power management mode. The default power
management mode might not have been as bad with a fat32 or vfat
filesystem, but filesystems such as ext2/ext3 constantly want to update
atime, so with my drive it turned out the heads would retract/reload
roughly 1.71 times per minute.