jim s wrote:
This discussion reminded me of a very interesting
series of incident
documented on the web pages linked to below. The problem was related
to control computers which were controlling exposure to patients which
were being treated with a particle beam from a linear accelerator.
The title is the Therac-25 Accidents, if you already are familiar with
it.
I don't recall from having not reread this whether this was a PDP-11
instrument or not, however, but this certainly remindes me of this.
Jim
http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Therac_1.html
Jerome Fine replies:
I thought that a timely response will be helpful to clarify
any possible confusion.
First I suggest that it needs to be strongly emphasized
that the actual PDP-11 hardware was NOT at fault, it was
the software, as far as I remember and understand about
the problem.
Second, I have not checked the link as yet, but I believe
that AECL was the company involved and responsible for both
the hardware configuration and the software.
Third, I seem to remember that the problems were the result
of a race (or timing) bug in the software which occurred when
a certain (unexpected?) sequence of keys were used on the VT100
terminal that the operator used to control the radiation which
was delivered to the recipient (in medical terms, the patient).
If all of the above is correct, then I am not surprised at
the reasons that the (so-called?) accidents took place
since the use of a less than expert systems analyst and the
lack of time for software checkout in those decades was
probably a systemic problem for all software projects, not
just medical related systems.
Even these days, I am still finding bugs in RT-11 operating
system software which, although not catastrophic, are the
result of obvious short cuts by programmers who should not
have used tricky code which works 99% of the time, and in
particular 100% of the time with the existing set of programs
written by the manufacturer (DEC in this case). Although one
bug has long been fixed (with V05.05 or RT-11 in 1989) as a
result of other software added to the operating system, the
problem was caused by (note that ^RUSR means the Radix-50
value of the 3 characters "USR"):
CMP (R2),#<^RUSR> ;Branch if the "USR"
BEQ SET.USR.CODE ; is being "SET"
ROR STATUS.WORD ;Set up Status Word for default device
As long as the value pointed to by (R2) was LESS than <^RUSR>,
then the CARRY bit was set and the ROR instruction did as was
expected (Status Word which was originally zero became 100000).
Since the value pointed at by (R2) was the first two characters
of the name of a device driver, the code worked perfectly ONLY
when the device driver name was less than <^RUSR>. Unfortunately,
the latest version of RT-11 which is a so-called hobby version
is V05.03 from 1985. Thus hobby users are still going to have
the problem if they want to use a device driver name such as
VM(X).SYS with V05.03 of RT-11 and are hoping for a transparent
substitution of VM(X).SYS with the modified HD(X).SYS device
driver when they are using Erstaz-11, which I agree will not
be a lot of hobby users. But I hope that my example illustrates
the reason that bugs tend to be in both applications and operating
systems in the first place.
Since DEC did not have any device drivers which had a name that
had a conflict, such as VM(X).SYS, which depended on the CARRY
bit being set after the CMP instruction, there were no conflicts
in the distributed code. Unfortunately, the documentation very
explicitly stated that the users who wrote their own device
drivers could always expect the CARRY bit to be set resulting
in a value of 100000 being passed to the user when no unit
number was specified.
About 3 months ago, the problem first occurred, but I was still
busy with other more pressing modifications and enhancements.
When those were finally finished, it took me a few days during
the past week to determine why a device driver with the name of
VMX.SYS (that I am using as a substitute for the DEC version)
did not work when I attempted to:
COPY HDX.SYS VMX.SYS
SET VM NAME
in order to accommodate changing the original name of the device
driver so that I could transparently substitute the code I wished
to use instead of the DEC version (which would then be totally
transparent as far as the rest of the process that I was doing)
along with providing me with the advantages (3 times the speed
of DEC's version, 4 times the available number of blocks, all
of the emulated PDP-11 memory now available instead of being
used as a disk drive, a smaller LOADed footprint AND the option
of being able to toggle the device between WRITE and NOWRITE).
Fortunately, the problem in this specific case could be solved
by ignoring the bug. However, with other SET commands which
rely on the correct value being supplied, such as being able to
distinguish between:
SET VM: NOWRITE
SET VM0: NOWRITE
other solutions will still have to be found.
Sincerely yours,
Jerome Fine
--
If you attempted to send a reply and the original e-mail
address has been discontinued due a high volume of junk
e-mail, then the semi-permanent e-mail address can be
obtained by replacing the four characters preceding the
'at' with the four digits of the current year.