PDP-11/45 RSTS/E boot problem
William Pechter
pechter at gmail.com
Thu Feb 14 22:16:01 CST 2019
> Message: 2
> Date: Wed, 13 Feb 2019 15:03:41 -0500
> From: Paul Koning <paulkoning at comcast.net>
> To: Jay Jaeger <cube1 at charter.net>, "General Discussion: On-Topic and
> Off-Topic Posts" <cctalk at classiccmp.org>
> Subject: Re: PDP-11/45 RSTS/E boot problem
> Message-ID: <C07861A6-BFD8-4AD0-AAB9-F4715904BB60 at comcast.net>
> Content-Type: text/plain; charset=us-ascii
>
>
> > On Feb 13, 2019, at 1:20 PM, Jay Jaeger via cctalk
> <cctalk at classiccmp.org> wrote:
> >
> > ...
> > Maybe that story about FE's using Unix as a test to confirm operation
> > even when diagnostics said the machine was OK was not so much just a
> > legend?
>
> It still fels like a legend. My experience with DEC field service
> engineers is that they used the diagnostics. In the PDP-11 era, Unix
> knowledge around DEC was pretty sparse, especially early on when it
> could be found only in the Telephone Products Group (Armando
> Stettner). RSTS would be more plausible, but I never saw that in the
> hads of FS engineers either.
> By and large diagnostics would find problems. I've seen a number in
> the 1970s, including a messy data path failure in the 11/45 MMU where
> we (college students) did the initial diagnosis while the FS engineer
> was on his way. My suspicion is that things not solved by diagnostics
> would be escalated to the "wizard from Maynard". And they'd probably
> start replacing whole subsystems. I've seen that once, when our
> college RSTS-11 system (11/20, 16 DL-11 lines) was crashing on average
> once a day for months. DEC brought in several of those "wizards". The
> "fix" was to replace the 11/20 by a "spare part" -- an 11/45 with more
> memory, a DH11, and RSTS/E. Decades later I was told that the wizards
> actually pinned the blame on the college FM broadcast transmitter,
> about 200 feet down the hall from the computer center. That may well
> be, though I didn't heard that at the time. RSTS did get used in
> manufacturing, at Final Assembly & Test sites like Westminster MA and
> Salem NH, where PDP-11 systems large enough to run RSTS/E were
> subjected to a load test of exerciser programs running under that OS.
> The way it was explained to us is that a system that would be happy
> with such a test would also be happy with any customer application.
> It's not clear if that was because RSTS would load things more than
> most, or was more finicky about hardware glitches than most, but it
> certainly was the practice for quite some time. Of course, not all
> PDP-11 configurations could be tested that way. paul
I guess the experience in NJ was a bit different since AT&T had two
dedicated Field Service offices who handled their sites including Bell Labs.
I was on the Commercial/Government side from 81-86 and we didn't get to
play with RSTS on customer sites at all (but sometimes we got to play in
the in-house machines in Princeton or on our own hardware).
It was a bit different in the Vax side since many diags were run under
VAX/VMS and as a brand new hire I was doing Vax installs -- including
installing the VMS 2.x and 3.x on 11/780's and 11/750's at install time.
If they had paid for software installation -- the software guys would
wipe and reinstall.
If not we left the pack and prayed the customer wouldn't wipe the diags
that we installed on the disk when we build the VMS pack. Realistically
the only thing the customer needed to do after we got done was tweak the
systen parameters, check the swap etc. and lay on the layered products
like languages.
Things got much more interesting when the VMS3.x and 4.x got CI780's and
HSC50's. That was more involved than the easy VMS 2.x-3.x install.
As far as the 11/70's -- I'm building a pidp1170... My last 11/70
install was around 84 or so when I put in a late DECDatasystem 570 blue
11/70 with the FCC Cabinets at AT&T in Freehold.
As far as the Wizard from Maynard -- one story from my branch support
guy (rumored to be about his
brother on the 11/70 line in (I think in Westminster MA... not Salem or
other NH plants) had an intermittant 11/70 that would crash every couple
of days and they could run all the diags and DEC X11 with no issues.
They called over their in-house wizard who ran toggle-in programs from
the front panel -- playing the switches like piano keys with both
hands. After about a half hour his comment was "Clean the terminator
fingers."
Machine ran like a SOB once the gold fingers were cleaned.
Weirdest 11/70 mess I had was after I left DEC to work for a third party
maintenance group. Their regional support was in Dallas. I was in NJ.
They couldn't find their support guy so they rushed me on a plane to
Chicago to work with two techs who were babysitting a mess they had no
clue on.
The site was WW Granger in Skokie and I arrived at 3AM... They had a 5
or 6 story warehouse which was a totally robotic automated site picking
water heaters and other industrial equipment from what looked like an
over-sized 6 floor tape library. Two 11/34's running RSX11 ran the
picker. One was down for weeks. Their 11/70 was half disassembled with
two techs working on it. They were VAX trained at a third party school
but they weren't PDP11 techs. An RM03 on the 11/34's was down as well.
The 11/70 was a RSTS/E box doing all the billing and inventory for
Granger at the site.
I walked in at 3AM with my Digital truckers cap on and found they
couldn't boot XXDP+ from tape.
The OS wouldn't come up either.
The customer gave me a pile of error logs dating back over six months --
(I think Sept through March) and they all showed memory management error
aborts and retries.
The techs who thought they were changing memory never found the MOS
memory box... they were swapping cache boards thinking they were memory.
Went to 10000 and deposited 014747 and ran it... It either failed on
addresses ending in 0 or 4 or 2 or 6.
The MOS on the 11/70 had two controllers and interleaved the memory.
Pulled one of the interleave controllers -- ran the toggle in and it
worked. Aha... bad memory controller.
Booted diags and sent for the board spare.
Decided the RM03 would be a bitch to work on without the tester or tools
and the management found a spare locally at a used DEC joint in the
area. Swapped the drive once we carried the new one up the stairs.
The 11/34 had a problem... the machine wouldn't boot and the run light
(IIRC) was on all the time.
The machine had two full unibus dd11-dk boxes even though it didn't need
them all.
Terminated at the CPU backplane and did toggle-ins. OK. Worked...
jumpered out the next UNUSED segment of the Unibus backplane with a
Unibus ribbon cable and the problem was gone.
The guys had been there over two weeks digging themselves a hole. Third
party service on DEC stuff varied with the person. Some were ex-DEC
genius types who were consultant level experts on the hardware. Some
just knew to swap the board with the Red Led lit.
Another time I ran into an engineer who told me (chip info here faked --
don't pull the prints...too many
years to have kept TE16/TM03 prints).
A call comes in to dispatch with the following information:
"The TE16 at Naval Air Propulsion, Trenton is down. It doesn't come on
line. The light is lit but the
system doesn't see it. I put the board on an
They supposedly changed memory on the 11/70 -- but wextender and U34 pin
12 is low and doesn't go high.
I need someone to come out and change the chip."
I call the site back. I'm in Princeton 15-20 minutes away. I get the
customer on line and tell
him I'll be there in 3 weeks or so. DECservice 2 hour response won't
cover the call since he wants a chip changed in 1985 and we don't stock
them -- so it will be a special source issue for logistics and we'll get
back to him. Or... I can swap the M8916 Logic And Write board in
about15 minutes. Does he want it fixed or does he want to prove he
called the correct chip...
Bill
More information about the cctalk
mailing list