>>>> "Vintage" == Vintage
Computer Festival <vcf(a)siconic.com> writes:
Vintage> On Wed, 19 May 2004, Tony Duell wrote:
> Well, in my experience it's more a case of
'You'll be down for a
> day, and then the machine will run for 5 years without problems'
> .vs. 'We'll be up in an hour, but you'll have all sorts of
> problems afterwards, and you'll probably keep on having to have
> another hour's fiddling every other week or so'. Of course they
> don't tell you that last bit ;-)
Vintage> Tony, in some cases, a business being down a whole day means
Vintage> that business goes under completely. It's not as cut and
Vintage> dry in the "real world".
I learned about low tolerance for downtime in my first job at DEC --
field support (traveling software repairman) for Typeset-11. When
your customers are newspaper publishers, being down for an hour is
very serious. Being down for a day is unthinkable.
Same sort of thing applies to the stuff I work on now (storage area
network storage boxes). When hundreds of users have several terabytes
of stuff on your box, it had better be up and stay up.
Tony is right that ignorant boardswapping doesn't help you then. But
neither does repair in the field.
What you do instead is a combination of fault tolerance (redundancy --
a failure doesn't take the system down) and accurate fault isolation
(when something breaks you can immediately tell what it is, so when
you do a module swap it WILL be the right module). This puts a large
burden on the designers, but that's the right place for the burden.
paul