I worked on a UK designed lock step fault tolerant computer for Sun Micro
systems the Netra FT1800.
This was a huge beast based on the E450 and had all the modules duplicated
and hot swap out design.
You could just pull out the CPU module (each had up to 4 * 450MHz Ultrasparc
II processor modules and 4gb of ram) and the system would continue running
with out a glitch and on reinsertion the system would reload and
resynchronise.
You could do this with any of the modules, disk, I/O (PCI cards fitted in
hot swap carriers, fan modules, psu's it was a direct competitor to Tandem.
http://sunsolve.sun.com/handbook_pub/validateUser.do?target=Systems/Netra_f…
There were issues with the lock step and sometimes they would go out of sync
b ut they never just stopped...
Anyone ever seen these?
Ta,
Andy.
On 30/10/2007, Chuck Guzis <cclist at sydex.com> wrote:
On 30 Oct 2007 at 9:17, Chris Kennedy wrote:
Yep, that was the Tandem way. You could watch
the lights blink on the
first processor, count two and watch the lights do precisely the same
thing on the second.
Yes, but as I said "it's nothing that simple"--to say that it was
would be completely discounting the enormous investment in software
that Tandem made to produce their NonStop systems.
Heck, back around then, a friend and I prototyped a system with three
PC/XT's and a proprietary expansion card that did three-way voting
and also performed hot replacement of failed processors. Basically
a garage operation and nearly sold to a then-cash-rich Everex. Maybe
good enough for process control, but too weak for anything more
involved than that. Our selling point was that it was off-the-shelf
and cheap. I think I still have the OrCAD files for our board
somewhere on a 5.25" 360K floppy.
We did nothing about what software ran on the system--and that was
the giant weakness. Without software, it was just another
interesting piece of iron.
Simple redundancy doesn't always identify which of the two systems is
producing the error--only that there was an error--and that's where
Tandem's genius comes in.
Tandem was a whole world apart--not only did they have hardware
redundancy (which would have been no great shucks back then), but
their software was constructed along a modular transaction-based
model, so that transactions were never lost. (Hence the popularity of
these in the banking sector).
Cheers,
Chuck