PDP-11/45 RSTS/E boot problem

Noel Chiappa jnc at mercury.lcs.mit.edu
Sat Feb 9 10:45:07 CST 2019


    > From: Fritz Mueller

    > This seems the best place to start with the LA this weekend then.

I'm going to respectfully semi-disagree! I think that at this point there's a
good chance we can localize to within a gate or two before we start applying
test instuments.

My thinking starts with two pieces of data; i) your discovery that when the
MM trap happens, the end of the pure text segment contains a fragment of code
from 04000 lower in the text, and ii) the data that the location in main
memory where that _should_ have been is full of zeros - i.e. never been
written into.

The latter is, I think, due to the fact that Unix clears all of main memory
on startup; I think it's just chance that that memory hasn't been used yet
for something else, and is still 0's. (Unix does clear main memory in a few
places during regular operation - e.g. when expanding the stack, the newly
added area is 0'd - but in general, e.g. when swapping in a pure text, it
doesn't seem to bother, which makes sense since it's all about to be
over-written anyway.)

Anyway, those two, together with my previous analysis that this was unlikely
to have happened when the text was first being read in from the file, block
by block, lead me to believe that the likely cause is that the BAR on the
RK11 skipped up a whole bunch (setting the 04000 bit at some point) when it
was reading the pure text back in from the swap, and skipped writing into
that zero-filled block of main memory, putting the stuff that should have
gone there up 04000, instead.

(Why it's swapping the text back in is too complicated to be worth explaining
here; anyone who _really_ wants to know should look here:

  http://gunkies.org/wiki/Unix_V6_internals

in the last section, "exec() and pure-text images".)

It's easy to confirm all these suppositions/deductions fairly easily, without
having to connect up, configure, etc the LA: we can just stop the machine
after the text is first read in (in xalloc()) from the file-system, and
confirm that the text looks good there; if so, either the swap-out (albeit
unlikely, since that doesn't account for the 0's) or subsequent swap-in had
an issue. The latter would be easy to confirm: just halt the machine after
the text is swapped in, and see what the RK registers contain.

At that point, as I said, we'll know to within a few gates where the issue
is, and then it'll be time to bring out the LA.

Actually, a plain oscilloscope would do; it's interesting to recollect that
these machines were designed and maintained without benefit of a LA, purely
with an oscilloscope! We're so spoiled now! :-)

	Noel


More information about the cctech mailing list