Windows for critical infrastructure (was Re: UNIX V7)

11 Jun 2009

Yeah, I re-read that later (see my later mails).
The whole set up sounds like a huge lesson in project mis-management,
even leaving out the OS decision.  (NT was not really ready for
mission-critical tasks.  My UNIX experience at the time (1998?) was with
Linux, and I don't know I'd trust it to the task either.  Maybe
Solaris?  No idea.  What was the recommended high-availability UNIX OS
at the time?)
But back to the Yorktown:  So you're the designer, and you're telling me
that:
1) You're putting commodity PC hardware (regardless of the OS) in the
_sole_ position of controlling important functions of the ship, without
ANY sort of backup?
1.5) Really?
2) You're putting said system in place in a very ad-hoc manner -- quote
...
 from wired article: "They rushed this stuff on the
ship, there was no  real prototype, and then they tried to make things work as they
went
along,"  (And it does sound very ad-hoc -- the database does no data
validation?  ("a crew member entered a zero into a /database/ field
causing a divide by zero...") The client app doesn't bother to check the
data either?  There's no backup system?  There's _no_ backup system of
ANY kind?)
Yeah, that sounds like a recipe for success no matter how you slice it :).
I admit, my experience/bias with Windows NT 4.0 makes me want to know
whether the divide by zero / buffer overflow was a _complete_ system
crash (i.e. a BSOD) which is what everyone who hates Windows assumes it
was, or whether it was more along the lines of : 1) operator puts
invalid data in the database, 2) all client machines pick up and use new
invalid data without validations, 3) all client ship-controlling apps
crash due to bad data, bringing down the ship system.
I say this, because as many problems as NT had, it wasn't _all_ that
easy to write a client app that would BSOD it :).  (Don't get me started
about a couple of in-box hardware drivers, though...)  Oh, there was
that infamous CSRSS.exe bug, but you'd have to WANT to trigger that one :).
There's a lot of bias in the articles I read on the Yorktown fiasco,
generally anti-Windows (see: the wired article I'm quoting above)  I've
never seen any _real_ information on the crash other than generic
clauses like "the system(s) crashed" which could mean anything between
"an outright BSOD of every system on the network" and "a poorly
written/specified/designed app/distributed system going down".  If there
is a real post-mortem analysis of this that's been publicly released,
I'd be interested in reading it.
Just my (at this point, way more than) two cents.
And I keep promising myself I won't get involved in these kinds of
off-topic discussions.  I'm a bad boy.  I'm done now :).
- Josh
Sridhar Ayengar wrote:
...
  Josh Dersch wrote:
 
Apparently.  Go look up the news archives for the Yorktown story.
 Read the facts before you make assumptions and accuse people of not
 knowing what they're talking about just because you don't like what
 they're saying. 
 I've never seen an explanation of what the failure actually was, just
 lots of articles stating that Windows NT was being used as the OS and
 -something- went wrong.  It could just as easily have been buggy
 client software that crashed. 
 The explanation at the time was that one of the operators put a zero
 into a database field he shouldn't have, which caused a divide-by-zero
 problem which led to a buffer overrun which cascaded to all of the
 workstations on the network.
 Peace...  Sridhar

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Windows for critical infrastructure (was Re: UNIX V7)