Stewpit Indy questions <:-/

15 May 2006

Jules Richardson wrote:
...
  Don Y wrote:
  For 64 bit memory (assuming you want to treat it
as a 64 bit
 entity) you need 7 bits (minimum) to "correct a single bit error".  
 Doesn't that mean that there's roughly a 10% chance that a memory error 
 is going to be in the ECC bits rather than the main operating memory - 
 and that a problem there could result in the system correcting an 
 imaginary fault? 
No.  The code words represent the only "legitimate" way of encoding
a particular "value" (data bits plus check bits).  So, an error in
a check bit is likewise detected (and corrected -- assuming it
is a single error).

My explanation wasn't meant to be thorough.  Rather, just a simple
aid to figuring out how many bits you need (at a minimum) for a
*single* bit correction.

If you *really* want me to dig out formal definitions for
all of these, I can -- but *you* can probably google an
explanation just as quickly!  :>

...
  Do systems generally do exhaustive tests periodically
on the ECC bits 
 only in order to minimise such problems? Or is the memory for the ECC 
 bits somehow made from more reliable (but more expensive) memory ICs? 
No.  To paraphrase a (bad) commercial:  "bits is bits".
The memory controller checks each "entity" fetched from
the memory array.  It dynamically determines what the
check bits *should* be for those data bits and compares
to the check bits read.  If a discrepancy exists, the
controller figures out WHICH bit(s) need to be corrected
and makes the adjustment before presenting the corrected
*data* bits to the host/cpu.

On each write operation, the check bits corresponding to the
data being written are synthesized and stored in the memory
array alongside the data bits.

(how errors are reported/handled is immaterial to the controller)

Note that ECC is not infallible.  It just increases the likelihood
of getting good data *if* the number of instantaneous failures
never exceeds the maximum number for which the code was designed.

E.g., typically, you can detect two bit errors and correct *one*
(whereas simple parity detects one and corrects zero).  Note that
a system designed to "detect 2, correct 1" will often gladly
report *3* errors as "no errors" (just like flipping *two*
bits in a simple parity scheme results in NO parity error -- even
if one of those bits is the parity bit)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Stewpit Indy questions <:-/