On Fri, 2 Oct 2015, Eric Christopherson wrote:
The SCSI
drives will transparently map out bad sectors, presenting a
apparently defect free disk (except while a sector as actually failing
:) so the image should be fine to any SCSI drive of that size or
larger (or an sd2scsi type device)
Well, I guess what I'm wondering is: if the SCSI controller or the
disk's PCB exposes an interface to the disk that makes it appear
defect-free, what happens when an OS tries to write to sectors that
actually have defects? I guess this is a good question for any sort of
hard disk, not just vintage or Sun ones. My understanding was that
yes, this was mostly transparent, but that at minimum the number of
usable sectors reported to the OS would go down as sectors get marked
bad.
SCSI disks are most often shipped formatted with a pool of spare sectors
available beyond the nominal externally visible capacity, although the
firmware of many allows reformatting with the size of the pool altered, at
the most extreme to zero. The pool is often partitioned into concentric
zones, called "notches" in SCSI-speak, with a different number of spares
assigned per notch (the number of regular data sectors per track will also
vary between notches then), although not all firmware exposes notches to
interfacing software, and some firmware may not allow reformatting at all.
Other than as parameters for the SCSI format command and statistics these
spare sectors are invisible to software and are allocated to replace
sectors at failed medium spots transparently, although often at a
performance cost, because of the out-of-place location of reallocated
sectors incurring an additional seek or rotational delay in linear access.
Reformatting a drive that has reallocated sectors present will normally
reduce the cost or remove it altogether as logical sector positions are
reassigned in a linear manner with bad spots already taken into account
(skipped over).
When reallocation fails because of the pool of spares having exhausted,
the originating SCSI command will return a hard failure. The same will
happen when a medium error happens and reallocation has been disabled.
There are two control bits which control reallocation, ARRE and AWRE,
which enable sector reallocation on reads and writes respectively.
In the former case reallocation is only made when a medium error is
benign enough for error correction (if implemented) to be able to recover
data actually read; firmware will usually retry a read a number of times
before it gives up and returns a hard error. I have actually seen such
retries to succeed sometimes, at which point reallocation prevented
further issues at the particular logical sector.
In the latter case reallocation is always made for writes to sectors
previously observed as bad and not reallocated in a read. Which is why
wiping out a disk which previously returned read errors will normally fix
it from the logical point of view, without the need to reformat it.
When the pool of spare sectors has been exhausted, the only way to revive
a failing disk is to reformat it with its externally visible capacity
reduced, as long as the firmware permits it. Marking blocks as bad at the
filesystem level would be the distant second choice here, as the presence
of bad (unrelocated) sectors often hurts read prefetches very badly.
The acronyms stand for Automatic Read (Write) Reallocation Enable BTW.
I hope this clarifies the matters here a bit.
Maciej