Happy weekend, all! Latest updates on this issue:
Identified and replaced a faulty 4116 DRAM (E204) on my MS11-L. After this, my small
hand-rolled standalone diagnostic passes the full 256K. I'll post my diagnostic
source over on my blog soon.
After this repair, tried MAINDEC ZQMC, called out as the appropriate diagnostic by the
MS11-L docs. This was interesting... First, it would barely run at all unless I disabled
parity checking with front panel switch settings. Second, it flagged a bunch of memory
locations that weren't reported by my much simpler diagnostic (which only does
all-ones/all-zeros passes looking for stuck bits at this point.)
The MAINDEC memory diagnostic is bulky and complicated, and it takes several minutes to
re-download it after a power cycle, so it's not exactly convenient to use while
troubleshooting. I'll probably be beefing up my smaller diagnostic with a few more
tests (including parity).
Went ahead and tried both RSTS and Unix again after the above repair, and saw the same
fault behaviors from both (sadness). Oh well, not there yet...
So, smokiest gun I have right now is the parity issue. Could be I still have a bad DRAM
on my MS11 in one of the parity banks... I tried enabling trap on parity error in the
MS11 CSR before running my diagnostic, but it didn't trap, even though it did flag
parity error(s) in the CSR. So maybe I *also* have a bug I haven't yet addressed in
parity handling within CPU. I realized there is a MAINDEC specifically for this (CKBR)
which I had previously overlooked. May give that a look today. Also, parity is one
significant difference between SIMH and my real hardware: SIMH emulates a memory system
with no parity hardware.
Looking into the parity issue some last night has raised a few questions:
- There is a lot of inconsistent and incomplete information in the documentation about
memory CSRs. They appear to come in different flavors depending on memory hardware; some
of the earlier ones support setting a bit to determine whether parity errors will halt or
trap the CPU, while some of the later ones (like my MS11-L) simply have "enable"
and don't distinguish between halt and trap. I'm curious how OS init code sniffs
out what memory CSRs there are, determines their specific flavors and, in a heterogeneous
system, determines how much address space is under the auspice of each CSR? Maybe Paul
and Noel can comment here wrt. RSTS and Unix respectively?
- The 11/45 prints show a jumper (W1, lower left of sheet UBCB) that looks like it would
entirely disable Unibus parity error detection if removed. This was an obvious thing to
check, but when I pulled and examined my UBC board (and also looked over my spare) no such
jumper or any associated pads were anywhere to be found! So maybe this was either
added/removed from later etches of the UBC? Anybody know more on this?
My UBC has required three separate repairs so far in the course of restoring this machine,
in order to address various independent issues. Now we may now be coming up on #4...
Based also on the rat's nest of green wires on these boards and the frustrated-looking
engineer scrawl *all* over this page of the prints, the UBC really is the heart of
darkness of the KB11-A :-)
cheers,
--FritzM.