Chuck Guzis wrote:
On 30 Oct 2007 at 10:19, Sridhar Ayengar wrote:
Refresh my memory how those worked? Two
processors in lock-step? Three
processors in a voting quorum?
Nothing that simple. Special software and hardware. As was driven
home to me by a Tandem engineer who was also a good friend, the term
of art used by Tandem is "Nonstop" not "Fault tolerant". A world of
difference between them. For a very good analysis, check out the
paper "Why Do Computers Stop and What Can Be Done About It?" by
Tandem's Jim Gray. It should be somewhere on the web, given its
importance. It describes in very eloquent terms, the Tandem
philosophy.
I always found Tandem machines a joy to work on. NSK/Guardian/TACL is a nice OS, the user
is nicely close to the hardware (unlike IBM mainframes, for instance). Almost UNIX-like.
The hardware's not too shabby, either. We once had a Tandem engineer over because one
CPU on our (16-CPU) machine was constantly giving errors. When he finally decided the best
course of action was to replace the CPU, he simply yanked it out. The machine kept running
like nothing happened. A new CPU was inserted and in the process monitor you could
literally see processes moving from their back-up CPU's back to the newly installed
one, which was their primary. All of this without any of the programs running skipping a
beat.
They had cute promotional items, too. One of my favourites was a coffee mug with two
handles on it.
,xtG
tsooJ