Parity (was Re: stepping machanism of Apple Disk ][ drive) - test-drb@ccmp.vtda.org

9 Apr 1999

Yes, increasing the amount of memory by 1/8 increases the likelihood of
failure by 1/8.  The inclination to bury one's head and ignore the potential
for memory or bus failure comes from the competition for price advantage on
the personal computer market, though.  The argument I've heard is "if MAC'c
can live with it, so can PC's" which may not be true, but appears to be true
enough for the typical user.

Once such a memory failure is detected, there's nothing you can do about it
except endeavor NOT to save the data which may be corrupted and become aware
of the problem.  I see memory parity errors (most of the PC's here use
parity or single bit error correction)
about twice a year.  Normally it's when a new box is being brought up and
memories aren't seated right or something on that order.  I don't know what
that says about the memory systems of today.

It's been a few years, but I always preferred single-bit correction over
parity in sizeable memory arrays.  I designed one fairly large buffer memory
for Honeywell, which had 72-bit-wide memory, 64 MB deep, which was quite a
bit for that time (1991) with single bit correction only to have the manager
tell me it was not needed.  "Whom are we helping with this added expense?"
was his position.  I pointed out that it would make memory problems a depot
or even field repair whereas it would be a return-to-factory otherwise.  He
insisted, though.  The software lead and I agreed we'd base our memory check
on parity, which still allowed for isolation of the faulty SIMM.  Since this
was not a main system memory but just a data buffer, it didn't matter that
it ws defective and firmware could rigorously isolate the faulty device.

I'm not sure what you're saying about the relative value of the extra bit of
memory versus the risk of promulgating a transient error into infinity by
recording it as though it were correct, Ethan.  You seem to suggest that it
would have been better not to have had the 60-cent memory part in place
rather than to find and repair it once its failure was detected by parity
circuitry.  I doubt you believe that, however.  It is true that the addition
of parity circuitry means that there is an elevated likelihood of failure
proportinal to the increased memory size.  It is also true that parity
checking circuitry requires time to work, and can, itself, fail as well.
Increased circuit complexity does increase the statistical probability of
failure.  ECC circuitry doesn't decrease the probability of memory failure.
It does decrease the amount of down-time resulting from it, and it avoids
the data loss and down-time associated with single-bit transient failures,
which are more common than hard failures.

I guess it's like automobile insurance.  If you have assets you need to
protect, you buy it.  If you haven't you don't.  My assessment is that Apple
started with the assumption that you don't.

Dick

-----Original Message-----
From: Ethan Dicks &lt;ethan_dicks(a)yahoo.com&gt;
To: Discussion re-collecting of classic computers
&lt;classiccmp(a)u.washington.edu&gt;
Date: Friday, April 09, 1999 6:10 AM
Subject: Parity (was Re: stepping machanism of Apple Disk ][ drive)

...

> On Thu, 8 Apr 1999, Richard Erlacher wrote:
>
> > My contempt for Apple begins and ends with their total disregard for the
...
 > > value of your data.
> > They designed the MAC with no memory parity assuming that you'd not
mind
...
   > if your
data was corrupted without your knowledge... 
Multiple studies of memory reliability (DRAM) show that parity memory is
more prone to failure than non-parity memory.  If you want reliability, you
have to go to something like Error Correcting Codes (ECC) like the big boys
use.  We had 39-bit memory on a 32-bit VAX (11/750) because the extra seven
bits let you *detect* two faulty bits and *correct* a single bit failure.
The Sun Enterprise servers I babysit have ECC memory - we used to get one or
...
 two failures in the machine room per year, but they
were logged and corrected
...
 without any loss of data.  My Alpha board (AXP-133
"no-name" board) uses 72-pin
...
 *parity* SIMMs in pairs to implement ECC on a 64-bit
memory bit.

The problem with parity is that yes, you do know that you had a failure, but
...
 now you have 9 bits that might fail, not 8, raising
your risk by 12%.  DRAM
failures are more often total rather than intermittent.  A memory test at
power-up is a better insurance policy than relying on parity to save your butt.
...

I did have the parity circuit on a PeeCee cough up a lung once... it was even
...
 a five-slot original PC (256K on M.B.).  We were using
it into the 90's because
...
 it was merely the terminal for a Northwest Instruments
logic/CPU analyzer that
...
 we used to check for problems in our MC68000-based
serial boards.  One day, the
...
 PC would not come up.  Because everything was socketed
and because I owned an
...
 IC tester, we got a bottom-of-the-totem-pole tech grunt
to pull each chip and
...
 test it.  It was a faulty 4164.  Labor costs: $25. 
Parts cost: $0.60 for a
part
we stocked thousands of for one of our older products.  I still have the
machine.  It still works.  I wish I had the invoice for that CPU; the company
...
 bought it new in 1981, around $5K, I know, but I'd
like to know the exact
figure.

Bottom line: Apple not using parity is not a reason to trash the Mac.  How
many PCs have parity since we moved to EDO and SDRAM?  It's extra cost and
extra complexity and extra possibilities for failure.  Unless you can correct
...
 the failure, it's not mathematically worth the
extra expense and reduced
reliability.

-ethan

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com