der Mouse wrote:
Doesn't
that mean that there's roughly a 10% chance that a memory
error is going to be in the ECC bits rather than the main operating
memory - and that a problem there could result in the system
correcting an imaginary fault?
Only if the ECC is incompetently done; a single-bit error, in the ECC
world, can be in any of the bits. ECC is not simply adding checking
bits to a normal word;
The check bits *can* be independant of the "data bits".
But, it usually requires a less efficient (in terms of
number of bits required) implementation.
The Hamming codes are like this -- for example, adding
4 bits to an 8 bit datum. But, there are several issues
related to *choice* of code (implementation) besides
"efficiency". You may also want to consider how likely
(probability) the code is for correctly decoding a
given word, etc.
And, you have to decide what *types* of errors you are likely
to encounter!
E.g., the errors you are likely to encounter in a data
channel (communication stream) are different than the
errors you are likely to encounter in a memory array.
And, even the *design* of the memory array can
influence the types of errors you are likely to experience
E.g., YEARS ago :> when you selected DRAM, you considered
the internal topology before designing your memory test
algorithms. You would deliberately look for failures
that mapped onto the internal structure of the die
(bad column drivers, etc.) as well as using test patterns
to introduce the most "noise" into adjacent memory cells, etc.
if you use 71 bits of memory to store 64 bits of
data ECC-protected, there normally will not be any 64 of those 71 bits
where you can look to always find the upper-layer data. Instead, the
coding scheme just ensures there are 2^64 distinct 71-bit words with a
Hamming distance of at least 3 (for SEC) or 4 (for SECDED) between any
two of them, with structure that makes it relatively easy to extract
the 64 bits of data from a 71-bit codeword.