On 2019-Feb-06, at 10:53 AM, Noel Chiappa via cctalk wrote:
I'm not sure that's going to tell us much: the latest development is that
Fritz looked at the actual memory contents again, and it is once again
trash; _almost_ identical to what was there before:
PA:171600: 016162 004767 000224 000414 006700 006152 006702 006144
but with some extra 010000 bits:
PA:171600: 016162 004767 000224 000414 016700 016152 016702 016144
(It's not clear if this represents a real difference, or if that
front panel issue Fritz mentioned caused the contents to be displayed
incorrectly.)
The exciting thing is that if the latter really is what's in main memory,
that '16700 16152' at the PC of the MM trap could indeed generate the MM trap
we're seeing: it's "MOV 26364, R0", and that address is in segment
(page) 1,
which is only 03500 long....
If so, i) we're down to one problem (good news), and our problem turns into
finding out how that section of the code got trashed (bad news). Which is not
going to be simple, alas, I suspect. I don't think it's the RK11, because
Unix reads the program image into system buffers in low memory, and that's
clearly working OK in the 'sleep;ls' case. (It may not use the exact same
buffers, though...) It then copies it out to the memory where it's going to
execute from, using an MTPI loop. So maybe the memory still has issues, or
maybe the MTPI isn't working with some main memory locations or or or...
I haven't followed this in detail enough to know what the configuration and memory
board at play are so maybe
this can be ruled out from your end, but for consideration, what about the refresh
circuitry of the memory board?
Mem diagnostics, unless they explicitly account for it, may not show up problems with
memory refresh
if the loop times are short enough to effectively substitute as refresh cycles, while they
could show up later in
real-world use with arbitrary time between accesses.
Refresh on some early boards/systems was asynchronously timed by monostables or onboard
oscillators
which can drift or fail on the margin/slope. (I don't know what DEC's design
policy was for DRAM refresh).
It might also explain why a number of 4116s were (apparently) failing earlier in the
efforts (if I recall the discussion correctly),
replacing them might have just replaced them with 'slightly better' chips, i.e.
with a slightly longer refresh tolerance.