From: Fritz Mueller
it flagged a bunch of memory locations that
weren't reported by my much
simpler diagnostic (which only does all-ones/all-zeros passes looking for
stuck bits at this point.)
What is is complaining about?
The MAINDEC memory diagnostic is bulky and
complicated, and it takes
several minutes to re-download it after a power cycle, so it's not
exactly convenient to use while troubleshooting.
Would it be possible to put it on a disk and boot it from there? If it's in
some documented format (e.g. .LDA), I can easily produce a Unix disk with it
on, if that would help (although loading the image onto the physical pack
would take forever, I guess - although you could let it run overnight).
It's probably not worth trying to devise a way to load individual files onto
a Unix disk over the serial line until Unix is working reliably, so the
program can run under Unix (otherwise a stand-alone program would have to
include file-system code).
I'll probably be beefing up my smaller diagnostic
with a few more tests
(including parity).
One of the first things to add is to store each location's address in it during
a set-up pass, and check to see that it's still there during the checking pass.
Went ahead and tried both RSTS and Unix again after
the above repair,
and saw the same fault behaviors from both (sadness).
Yeah, sounds like you still have memory issues (per the diagnostic grumping).
I tried enabling trap on parity error in the MS11 CSR
before running my
diagnostic, but it didn't trap, even though it did flag parity error(s)
in the CSR. So maybe I *also* have a bug I haven't yet addressed in
parity handling within CPU.
Starting the CPU (i.e. 'START' switch) or an INIT instruction will clear
the 'trap enable' bit in the MS11-L CSR.
I'd modify your program to set it, and check to see if you're getting
parity error traps. (Clearly, if that hardware - either in the MS11-L,
or the CPU - isn't working you need to look at that first.)
some of the earlier ones support setting a bit to
determine whether
parity errors will halt or trap the CPU
Huh? I was just looking at parity in the MM11-L and MM11-U (to see if
parity needed to be enabled on them, or if it's always on by default),
and I didn't see that. Also, there's no way I know of, on the UNIBUS,
for anything to halt the CPU (the QBUS has such as line, but not the
UNIBUS). Which memory has this feature?
I'm curious how OS init code sniffs out what
memory CSRs there are,
determines their specific flavors and, in a heterogeneous system,
determines how much address space is under the auspice of each CSR?
Unix V6 does nothing at all with parity (doesn't enable it in memory modules,
although the memory that was extant at the time - MM11-S, MM11-U, etc - did
support it as an option).
If one turned it on, the code _would_ catch the trap and 'panic' (print a
message and halt operation). It would be pretty easy to modify the code to
send a signal to the process if it happened in User mode. I'm not sure there's
much to be done if it happens in Kernel mode.
V6 sizes memory by doing a read every 0100 bytes (of the xxxx00 byte), looking
for success or a trap. If that succeeds, it clears the 32. sequential words
starting at that address, and then tries the next 0100. (So if you modified
the code to enable parity traps, you wouldn't hsave to deal with bad parity
left over from random contents at power-on....)
The 11/45 prints show a jumper (W1, lower left of
sheet UBCB) that
looks like it would entirely disable Unibus parity error detection if
removed.
Yup, that's what it looks like to me too..
when I pulled and examined my UBC board (and also
looked over my spare)
no such jumper or any associated pads were anywhere to be found! So maybe
this was either added/removed from later etches of the UBC?
Well, if you have an M8106, you do have a KB11-A; in the later /45 CPU, the
KB11-D, that has been replaced by the M8119 - but that still has W1! (The
KB11-D prints are in MP00039, 11/55 Vol 1.) I looked on my M8119, and W1 is
indeed there - it's a 0-ohm 'resistor' (single black band) just less than
half-way up the 4th column of chips, with a '1' next to it in the etch. The
M8106 board layout drawing (a couple of pages back from UBCB) does show W1 -
upper left corner of the board, next to E84.
Noel