On Jan 19, 2019, at 4:17 PM, Fritz Mueller via cctalk
<cctalk at classiccmp.org> wrote:
...
Looking into the parity issue some last night has raised a few questions:
- There is a lot of inconsistent and incomplete information in the documentation about
memory CSRs. They appear to come in different flavors depending on memory hardware; some
of the earlier ones support setting a bit to determine whether parity errors will halt or
trap the CPU, while some of the later ones (like my MS11-L) simply have "enable"
and don't distinguish between halt and trap. I'm curious how OS init code sniffs
out what memory CSRs there are, determines their specific flavors and, in a heterogeneous
system, determines how much address space is under the auspice of each CSR? Maybe Paul
and Noel can comment here wrt. RSTS and Unix respectively?
I know essentially nothing about memory parity handling, but I quickly skimmed some RSTS
INIT code (for V10.1). Two things observed:
1. At boot, INIT determines the memory layout. It does this by writing 0 then -2 into
each location to see if it works. If it gets an NXM trap (trap to 4) or a parity trap
(trap to 114) it calls that 1kW block of memory non-existent. For the case of a parity
error, it tells you that it saw a parity error and is disabling that block for that
reason.
2. In the DEFAULT option (curiously enough) there is a routine that looks for up to 16
parity CSRs starting at 172100. This happens on entry to the memory layout option. You
can display what it finds by using the PARITY command in response to the "Table
suboption" prompt.
It checks if the bits 007750 are active in the parity CSR, if so it takes that to be an
address/ECC parity CSR. It figures out the CSR to memory association by going through
memory in 1 kW increments, writing 3, 5 to the first 2 words, then setting "write
wrong parity" in each CSR (007044), then doing BIC #3,.. BIC #5,... to those two test
words, then reading them both back. This should set bad parity, and it scans all the CSRs
to see which one reports an error (top bit in the CSR). If no CSR has that set, it
concludes the particular block is no-parity memory.
I probably got some of the details wrong, the above is from a fast skim of the code, but
hopefully it will get you started.
Good to hear you're making progress with the hardware debug!
paul