I have a problem that cropped up that spans both old systems and flaw
maps / disks that have flaws to skip over and current technology.
In the old days with my experience on a lot of cdc winchesters and
removable pack smd drives (and trident) there were flaw maps you could
use in a controller to figure out where the bad spots were.
When shipped from the factory the mmd / cmd / emd / fsd drives (at least
the first two) were not allowed to have more than a false address mark
on a track, and a limit of maybe 5 for the entire stack. Later at least
on the mmd's they had run of marginal media and upped the FAM errors
considerably. (10? 15? don't recall). I later found some drives
delivered to such as Datapoint for large systems delivered when we got
the story about the FAM problem (needing to up the count) that had zero
flaws, but that was 10 years later buying some scrap drives (better than
the ones I had purchased 10 years before, still working).
Anyway, recently I had a system with a 1.5tb seagate grow a count of
"uncorrectable offline sector count" errors. I'm telling this up front
since these are effectively the same as the above errors, sectors that
are not recoverable by the drive and presented as bad or timeout spots
when you seek to the sectors.
The errors were not there when I initialized the linux system on the
drive, and grew later. To complicate things a bit this was part of a
LVM raid ext3 raid 5 set, so there are other complications here, but the
initial build was flawless, ran about a 6 months then this drive grew 16
bad errors visible to linux.
So, I have now got the situation where there is a bad spot in a file
(more complicated because this is part of a raid set, but bear with
me). If you power cycle the drive set they do a scan of the media with
the raid system I'm using (linux based) and hang before releasing it for
server operations.
That is the only flaw that there was, and I had a brick. Thank heaven I
could put it in a desktop system and recover the data (7tb of it).
Anyway, have we lost the capability with such as Linux to run with flaws
growing on media at the level where transfers from media come from the
drive target to the host, or did this vendor of raid equipment
(appliance was readynas nv2+) have a flaw in their bringup procedure.
I am glad I shopped and got a system with raid 5 support like this with
a linux system that I could take out and troubleshoot with any linux
tools, rather than hardware raid. Dodged that bullet.
but I am disappointed even so with the behavior of the raid set when I
put it on my recovery system. I think there is a basic loss of what one
would have been accustomed to in earlier times with media, throwing
their hands up with defective media.
I was only able to narrow down the error with manual applications of
careful dd commands and shell scripting (lacking a better tool) to see
the errors. There is a nice timesaving web page if you hunt around the
pages found by searching for smart errors.
At least that is a nice tool, telling a lot about the drive.
I am not going into why a 1.5tb drive with huge amounts of extra space
(as I understand it probably > 500gb) can't reassign media 16
consecutive sectors, as that is a totally different discussion.
Jim