Thanks, Paul and Noel, for the detailed responses per usual!
On Jan 20, 2019, at 6:55 AM, Noel Chiappa <jnc at
mercury.lcs.mit.edu> wrote:
What is [MAINDEC ZQMC] complaining about?
Looks like a few more flaky bits in a couple of additional banks. For those reading along
who may be unfamiliar with the MS11-L, it is laid out as 8 physical banks, each containing
18 16K x 1 DRAMS (16 data + 2 parity bits per word. So a flaky bit in a physical bank
implicates one particular chip.
Would it be possible to put [ZQMC] on a disk and boot
it from there?
I have thought about that... The most efficient way I think would be to work up a simple
LDA loader that would fit in a boot sector, and load a diagnostic from contiguous disk
starting at the second sector. It would then be easy to blast down just the boot sector
and a single desired diagnostic without imaging an entire pack.
One of the first things to add [to custom diagnostic]
is to store each location's address in it during a set-up pass, and check to see that
it's still there during the checking pass.
I did this last night, actually. I also added a "random" bits test that uses
the program image itself as a source sequence for words to write/compare.
The good news is that my enhanced diagnostics now detect failures in the same physical
banks and with the same bits as those flagged by the MAINDEC diagnostic. This was a good
lesson learned: all ones / all zeros is definitely not good enough when checking this sort
of thing!
Another thing I found interesting, though, is that the "random" test *also*
found a malfunctioning bit that the address test had missed. So ones/zeros and address
isn't really good enough, either.
I'm technically curious, now, about the failure modes of these sorts of DRAMS. I
guess in addition to stuck bits, there are also potential decode fails (show up on address
test, but not ones/zeros) and some errors that have history-dependence, perhaps internal
latches (show up on random data test, but not address or ones/zeros.) I'd guess also
there might be potential for crosstalk, noise, and "fading bit" type issues as
well? Will have to see after I make the next round of repairs if there are still
additional problems that the MAINDEC flags that my simplistic diag isn't shaking out.
I've also been somewhat surprised by the level of repair needed on this memory board.
So far, I've seen 6 failed 4116 out of an array of 144 total, so about a 4% failure
rate. Is this typical for vintage 4116, or did somebody leave my poor MS11 out in a
lightning storm? :-)
Starting the CPU (i.e. 'START' switch) or an
INIT instruction will clear
the 'trap enable' bit in the MS11-L CSR.
D'oh! Yes, thanks; I may very well have mucked that up. I'll give it another try
with a little more care later today.
Which memory has this [parity halt vs trap] feature?
Hmm, I saw this at least once when researching the variety of CSR formats yesterday
morning; I'll have to see if I can dig it up again today. Might be just a fastbus
thing? It's also hinted in paragraph 7.7.7 of the older KB11-A maintenance manual
(NOT the later edition that covers both KB11-A and KB11-D):
"The semiconductor memory control EHA and EHB (enable halt) flip-flops may be set
under program control to assert SMCB PE HALT L if a parity error is detected. This input
also asserts UBCB PARITY ERR SET L, which set the console flag and halts the CPU."
This particular text is removed from the later KB11-A,D maintenance manual, and the
description there seems to imply all reported parity conditions trap directly to 114. But
there aren't any details in this section concerning processor revision/version etc.
The logic design around all this is a bit complicated, and the fact that there are
apparent discrepancies between the texts, available prints, and the actual M8106 boards I
have on hand is not heartening!
The M8106 board layout drawing (a couple of pages back
from UBCB) does show W1 -
upper left corner of the board, next to E84.
Yup. And, surprisingly, neither one of my M8106 has either a jumper or the indicated
pull-up at that location! I'll try to send a pic later. The fact that W1 exists on
the M8119 is interesting; maybe the situation is that the prints are for later revisions,
and my actual M8106 are earlier? My /45 is a very early one -- serial 154!
cheers,
--FritzM.