On Wed, 25 Aug 2010, Eric Smith wrote:
I think the way that I would state it is that
just because the system is
working correctly (at the moment) doesn't mean that it isn't broken.
Part of the reason I don't think we'd fix such a problem in the PDP-1 is
that the PDP-1 isn't doing anything criticial. We can afford to have
downtime if a latent problem eventual causes a failure. If we've
properly documented that latent problem, we can check for it when the
system does fail, and fix it if necessary at that time.
Your ability to tolerate downtime is significant, as also is your thorough
documentation, so that the repair won't end up in the hands of board
swappers.
Y'know, it would be fun to see an entire set of exhibits explicitly about
failure modes. Remember the HUH S100 boards for the TRS80, where one of
the entire early production runs was reversed, but could be used by
soldering all of the components to the back side of the board?
--
Grumpy Ol' Fred cisin at
xenosoft.com
I have a different take on it. What is the breakage?
If it's failure is intermittent and CANNOT result in damage
I'd class that as minor and leave it. But document it!
If the failure could do minor damage but easily repaired
It would required evaluation and a decision. Document
the problem and actions taken or not taken. It's a
intermediate concern
If the failure can result if significant damage or
unrepairable damage or the possibly of severe damage.
Repair or mitigation should be applied to prevent
that. Document the mitigations/repairs and all the
supporting decisions. This is a major concern as it
can render the machine useless inert mass or possibly
worse.
The minor could be a gate with a transistor that has
compromised characteristics. This has historical value.
Intermediate could be a core driver that threatens to
burn up a sense inhibit line making for a difficult and
time consuming core repair.
Major could be a power supply that fails and goes
over-voltage enough to destroy whole sections.
Or a situation like PDP-15 had where there was
a fan failure that could result in un-contained fire.
At the extreme this is a hazard to people and other
artifacts. At best it will result in the population of
working machines to very few or zero. A great loss.
I have not specified how the modification or repair is done
only the process leading to it. Something rare and hard
to find spares for repair should have all efforts made to
do so in a reasonable way. Safety should never be
compromised.
Allison