On 10/07/2012 01:38 AM, jim s wrote:
recently I had a system with a 1.5tb seagate grow a
count of
"uncorrectable offline sector count" errors. [...] To complicate
things a bit this was part of a LVM raid ext3 raid 5 set,
When you have a drive go bad in a RAID 5, it's best to pull the drive,
put a new one in its place, and start a rebuild. That's the point of
using RAID 5. Trying to recover data from the failing drive is mostly a
waste of time. Of course, if the drive wasn't in a RAID 5, mirror,
etc., you wouldn't have that option.
The drives do all sorts of magic "under the hood" to try to recover your
data, and they'll remap sectors that get marginal, but when sectors
suddenly become completely unreadable, they don't remap them. If they
did, when you read them they'd report no error but with data different
than was originally written, which would be completely unacceptable.
If you try to write to the bad sectors, the drive may remap them.
However, once a drive starts reporting hard errors, I consider it time
to scrap it.
I've done an absurd amount of data recovery from failing drives,
corrupted RAID arrays, etc., and it is no fun at all. I've had to write
a lot of my own tools as I go along, but most of them were so specific
to a particular problem as to not be worth sharing. One possible
exception was a tool I used to recover a 3ware RAID 5 array on which the
RAID metadata had accidentally been destroyed. The content data was
still completely intact. I had to figure out how the parity rotation of
the RAID stripes worked, and write my own code to reconstruct it onto a
new array.
Eric