PDP-11/45 RSTS/E boot problem

Wed Feb 6 14:54:14 CST 2019

On 2019-Feb-06, at 10:53 AM, Noel Chiappa via cctalk wrote:
> 
> I'm not sure that's going to tell us much: the latest development is that
> Fritz looked at the actual memory contents again, and it is once again
> trash; _almost_ identical to what was there before:
> 
>  PA:171600: 016162 004767 000224 000414 006700 006152 006702 006144
> 
> but with some extra 010000 bits:
> 
>  PA:171600: 016162 004767 000224 000414 016700 016152 016702 016144
> 
> (It's not clear if this represents a real difference, or if that 
> front panel issue Fritz mentioned caused the contents to be displayed
> incorrectly.)
> 
> The exciting thing is that if the latter really is what's in main memory,
> that '16700 16152' at the PC of the MM trap could indeed generate the MM trap
> we're seeing: it's "MOV 26364, R0", and that address is in segment (page) 1,
> which is only 03500 long....
> 
> If so, i) we're down to one problem (good news), and our problem turns into
> finding out how that section of the code got trashed (bad news). Which is not
> going to be simple, alas, I suspect. I don't think it's the RK11, because
> Unix reads the program image into system buffers in low memory, and that's
> clearly working OK in the 'sleep;ls' case. (It may not use the exact same
> buffers, though...) It then copies it out to the memory where it's going to
> execute from, using an MTPI loop. So maybe the memory still has issues, or
> maybe the MTPI isn't working with some main memory locations or or or...

I haven't followed this in detail enough to know what the configuration and memory board at play are so maybe
this can be ruled out from your end, but for consideration, what about the refresh circuitry of the memory board?

Mem diagnostics, unless they explicitly account for it, may not show up problems with memory refresh
if the loop times are short enough to effectively substitute as refresh cycles, while they could show up later in
real-world use with arbitrary time between accesses.

Refresh on some early boards/systems was asynchronously timed by monostables or onboard oscillators
which can drift or fail on the margin/slope. (I don't know what DEC's design policy was for DRAM refresh).
It might also explain why a number of 4116s were (apparently) failing earlier in the efforts (if I recall the discussion correctly),
replacing them might have just replaced them with 'slightly better' chips, i.e. with a slightly longer refresh tolerance.