PDP-11/45 RSTS/E boot problem

Fritz Mueller fritzm at fritzm.org
Sun Jan 20 14:56:51 CST 2019


Thanks, Paul and Noel, for the detailed responses per usual!

> On Jan 20, 2019, at 6:55 AM, Noel Chiappa <jnc at mercury.lcs.mit.edu> wrote:
> 
> What is [MAINDEC ZQMC] complaining about?

Looks like a few more flaky bits in a couple of additional banks.  For those reading along who may be unfamiliar with the MS11-L, it is laid out as 8 physical banks, each containing 18 16K x 1 DRAMS (16 data + 2 parity bits per word.  So a flaky bit in a physical bank implicates one particular chip.

> Would it be possible to put [ZQMC] on a disk and boot it from there?

I have thought about that...  The most efficient way I think would be to work up a simple LDA loader that would fit in a boot sector, and load a diagnostic from contiguous disk starting at the second sector.  It would then be easy to blast down just the boot sector and a single desired diagnostic without imaging an entire pack.

> One of the first things to add [to custom diagnostic] is to store each location's address in it during a set-up pass, and check to see that it's still there during the checking pass.

I did this last night, actually.  I also added a "random" bits test that uses the program image itself as a source sequence for words to write/compare.

The good news is that my enhanced diagnostics now detect failures in the same physical banks and with the same bits as those flagged by the MAINDEC diagnostic.  This was a good lesson learned: all ones / all zeros is definitely not good enough when checking this sort of thing!

Another thing I found interesting, though, is that the "random" test *also* found a malfunctioning bit that the address test had missed.  So ones/zeros and address isn't really good enough, either.

I'm technically curious, now, about the failure modes of these sorts of DRAMS.  I guess in addition to stuck bits, there are also potential decode fails (show up on address test, but not ones/zeros) and some errors that have history-dependence, perhaps internal latches (show up on random data test, but not address or ones/zeros.)  I'd guess also there might be potential for crosstalk, noise, and "fading bit" type issues as well?  Will have to see after I make the next round of repairs if there are still additional problems that the MAINDEC flags that my simplistic diag isn't shaking out.

I've also been somewhat surprised by the level of repair needed on this memory board.  So far, I've seen 6 failed 4116 out of an array of 144 total, so about a 4% failure rate.  Is this typical for vintage 4116, or did somebody leave my poor MS11 out in a lightning storm? :-)

> Starting the CPU (i.e. 'START' switch) or an INIT instruction will clear
> the 'trap enable' bit in the MS11-L CSR.

D'oh!  Yes, thanks; I may very well have mucked that up.  I'll give it another try with a little more care later today.

> Which memory has this [parity halt vs trap] feature?

Hmm, I saw this at least once when researching the variety of CSR formats yesterday morning; I'll have to see if I can dig it up again today.  Might be just a fastbus thing?  It's also hinted in paragraph 7.7.7 of the older KB11-A maintenance manual (NOT the later edition that covers both KB11-A and KB11-D):

"The semiconductor memory control EHA and EHB (enable halt) flip-flops may be set under program control to assert SMCB PE HALT L if a parity error is detected.  This input also asserts UBCB PARITY ERR SET L, which set the console flag and halts the CPU."

This particular text is removed from the later KB11-A,D maintenance manual, and the description there seems to imply all reported parity conditions trap directly to 114.  But there aren't any details in this section concerning processor revision/version etc. 

The logic design around all this is a bit complicated, and the fact that there are apparent discrepancies between the texts, available prints, and the actual M8106 boards I have on hand is not heartening!

> The M8106 board layout drawing (a couple of pages back from UBCB) does show W1 -
> upper left corner of the board, next to E84.

Yup.  And, surprisingly, neither one of my M8106 has either a jumper or the indicated pull-up at that location!  I'll try to send a pic later.  The fact that W1 exists on the M8119 is interesting; maybe the situation is that the prints are for later revisions, and my actual M8106 are earlier?  My /45 is a very early one -- serial 154!

    cheers,
      --FritzM.



More information about the cctech mailing list