Don Y wrote:
For 64 bit memory (assuming you want to treat it
as a 64 bit
entity) you need 7 bits (minimum) to "correct a single bit error".
Doesn't that mean that there's roughly a 10% chance that a memory error is
going to be in the ECC bits rather than the main operating memory - and that a
problem there could result in the system correcting an imaginary fault?
Do systems generally do exhaustive tests periodically on the ECC bits only in
order to minimise such problems? Or is the memory for the ECC bits somehow
made from more reliable (but more expensive) memory ICs?
No. Any decent ECC system assumes the extra ECC bits could be in error
too.
If you have an n-bit word size, and add another m bits, you have a memory
location that can store 2^(m+n) values. In fact only 2^n of those are
ever written (those correspond to the 2^n different values you could
store in the n-bit word without ECC). And those values are carefully
chosen so that if any one of the n+m bits flips state, the ECC circuitry
can still work out which of the 2^n values was origianlly written to that
location. So the error can be corrected -- even if it's in one of the ECC
bits.
-tony