PDP-11/70 progress (and a cry for help)

Mon Feb 15 19:55:58 CST 2021

Hi all --

Thought you all might be interested in an update, and I'm also looking for
advice in debugging the current issue I'm hitting.

After replacing the clock crystal on the TIG, the system started showing
signs of life, but the Load Address switch would stop working after being
powered on for 10-30 seconds, but would work fine single-stepping via the
KM11.  Brought the DAP board out onto the extender for debugging and the
problem went away.  Reinstalled the board after cleaning the slot (again)
and the problem hasn't recurred since.  First bad backplane connection, I'm
sure it won't be the last.

After this, addresses could be loaded, data could be toggled into memory.
But instructions wouldn't execute; Tracing through the microcode with the
KM11 indicated that the microcode flow was aborting early and returning to
the main console loop (via BRK.90) before the instruction fetch at FET.00;
this was due to the TMCB BRQ TRUE H signal being stuck high.  Probing of
the TMC board revealed a bad 74H30 at E70, which had its output stuck at
1.65V or so, just high enough to confuse things.

Now instructions would execute but the PC would contain garbage after
execution of an instruction, after tracing the microcode and staring at the
flow diagrams all signs pointed to the PCB register (twin to the PCA
register that is used for storing PC data) having trouble.  Garbage in the
PC after execution was always in bits 6-11, everything else was fine, which
pointed to a 74S174 at H47 on the DAP board.  Replaced and now instructions
execute!

Mostly.  They seem to execute properly when single-stepping instructions,
or running off the RC clock at a clock rate of about 16-20Mhz, any faster
than that and things stop working correctly.  This is what I'm currently
banging my head against -- if anyone has any experience with the 11/70 or
wants to stare at the manuals for a bit (and who doesn't?), I'd appreciate
any extra input.

There are a number of different issues, I'm currently focusing on
two-operand instructions that take an immediate argument (MOV #10, R0, or
ADD #42, R5) for example.  The behavior here is a bit befuddling and I
can't quite figure out how it ends up happening, given the microcode.

I'll use ADD as a representative example.

An ADD #10, R0 instruction (followed by HALT) poked in at address 1000
executes properly -- R0 gets 10 -- but afterwards the PC is corrupted: it
contains 2, rather than 1004.  In the general case, "ADD #X, R0" ends with
PC containing 2 + <original value of R0>.  (MOV shows the exact same
behavior, except that there's no addition, obviously.)

This value of PC is shown in the Address lights, as well as when examining
the register from the front panel (at 17777707).

When single-instruction-stepping the processor this instruction executes
perfectly: R0 gets R0+10, PC is 1004 afterwards (both in the Address lights
and when examining from the front panel).  I have verified with my logic
analyzer that when running normally (i.e. not single-stepping) the
microcode executes the proper sequence of instructions -- which is the same
as executed when single-stepping except at the very end:  In FLOWS 4, after
the D00.90 instruction executes, a branch is taken to BRK.90, which exits
back to the console loop.

I don't believe there should be any other differences in execution between
the two paths -- other than the branch at the end there are no conditional
branches or conditional operations based on whether the CPU is
single-stepping or not.  There's a signal somewhere in there that has just
gone a little bit slow... the trick is finding it.

For reference, the microcode sequence (starting at FET.03, see pg. 5 of
http://bitsavers.org/pdf/dec/pdp11/1170/MP0KB11-C0_1170engDrw_Nov75.pdf) is:

334 (FET.03)
260 (FET.10)
343 (IRD.00)
022 (S13.01)
027 (S13.10)
205 (D00.90)
260 (FET.10)
343 (IRD.00)
010 (HLT.00)
316 (HLT.10)
164 (FET.04)
240 (BRK.90)
352 (BRK.00)
170 (CON.00)

You can see it fetching and executing the ADD instruction, then returning
back to FET.10 and executing the next instruction, which is a HALT
instruction (because all other memory contains 0 at this point).  I believe
this is what causes the "+2" portion of the final (incorrect) PC value.
(What's extra odd -- literally -- here is that if you start with a "1" in
R0, the final PC is 3... seemingly indicating a fetch/execution of an
instruction at an odd address, which you'd think would cause a trap
instead...)

I've been staring at this awhile and I'm puzzled; everything seems to
execute properly, the instruction is fetched and decoded, and the immediate
value is fetched, the ALU does the right thing and the result is properly
stored in R0.  And then the PC gets screwed up, and I'm not quite sure how
that's possible from looking at the microcode, so I'm not quite sure where
to start looking.

I sort of suspect the PCB register again, as this is related to the
difference in behavior between single-stepping and normal execution:  the
branch back to the console loop *doesn't* update PCA from PCB, whereas the
branch back to the fetch / decode loop does.

Anyone have any bright ideas as far as what to poke at?
Thanks as always,
- Josh