So I've been helping Fritz look into his -11/45 problem, and things have
gotten to a point where I'd like to reach out for help, more eyes, etc.
I have to say, I spent almost a decade at the start of my career working on
PDP-11 hardware ('new build' DMA devices, as well as fixing broken stuff), and
software, and this is, I think, the most confusing and difficult problem I
have _ever_ seen on one. Hence the above...
What's _particularly_ confusing and difficult is that it seems like _three_
separate, un-related things all go wrong at exactly (2 of 3) or close to (the
other) the same time. And the machine now passes all the diagnostics that have
been thrown at it, particularly the KT11 and RK11 diagnostics (why this is
important will become clear). So here's what we've found to date.
The failure we're looking at is that an attempt to execute the 'ls' command
under Unix V6 fails; it gets a memory mangement fault, and dumps core.
AFAICT, the shell successfully forks, and its attempt to do an exec() of 'ls'
sort of works (more below), but a few instructions in, we get the MM fault - but
there's even more wrong when that happens (details toward the end below).
I've been looking at the core dump produced by the process, which gives me the
registers at the time of the trap, the user's stack, etc - but not a copy of
the binary code - the 'ls' command is a so-called 'pure text', i.e. the binary
is segregated into separate, potentially shared, read-only 'segment(s)' (only
1 in this case) of the PDP-11's User mode address space, and is not included
in the process dump.
(I use the term 'segment', which is actually what DEC called them in the first
version of the PDP-11/45 processor handbook, because that's what they are, not
pages, as pages are on most systems. I assume they changed to 'page' for
marketing reasons. And please, can we hold debate about this and focus on the
problem? Thanks! :-)
I do have the ability to look at the binary that it _should_ be executing, by
examining the command in its file. Also, Fritz has worked out that he can
patch the MM trap vector (before trying to do the 'ls') to halt the machine
when it happens, so he can read out all the KT11 registers, look at the actual
program in main memory, etc.
First oddity - the problem is dependent on the location of the command in main
memory! If Fritz says "sleep 360 &", to run a trivial command in the
background, and _then_ says 'ls' - it works (so we know the binary of 'ls' on
disk is OK)! We _think_ this is because the process executing the 'sleep'
takes up a chunk of main memory, and thus changes the location of the process
executing the 'ls'.
The problem is that I'm reluctant to try and change anything (e.g. to have the
OS print out anything) because that will change the location of things, and we
may (likely?) will not get the problem. With nothing changed, it _reliably_
fails - I've looked at two different core dumps, and all the essential data
(registers, user stack etc) are identical. The KT11 registers all seems to be
the same, too.
So, on to details.
I'm pretty sure the command only gets a few instructions in before it blows
up. Here are the process' registers, and the _entire_ contents of the user
mode stack:
R0 177770
R1 0
R2 0
R3 0
R4 34
R5 444
SP 177760
PC 010210
060: 000000 000020 000001 177770 177774 177777 071554 000000
010210 turns out to be the first word in 'csv', which is an internal routine
which PDP-11 C uses to build a stack frame - _every_ C routine starts with
a "JSR R5, CSV" instruction as the first thing it does.
So looking at the stack (which looks good; it contains a valid 'argc' and 'argv'
that the process would be started with), and the registers, I'm pretty sure
it does these starting instuctions OK:
start:
setd
mov sp,r0
mov (r0),-(sp)
tst (r0)+
mov r0,2(sp)
jsr pc,_main
_main:
jsr r5,csv
and then blows up on:
csv:
mov r5,r0
So it's the 8th instruction in that blows up (*): but not only is what's in
memory at that location _not_ 'mov r5,r0', it also gets an MM trap that
makes no sense.
(*: In user mode: if you don't have an FPP, the first one will trap, which
UNIX ignores.)
Fritz has looked at the KT11 register when the trap happens, and the PARs and
PDRs all look good. The SSRs contain:
> SSR's: 040143 000000 010210 000000
SSR2 gives the PC at the time of the fault (again 010210); SSR0 shows:
Abort - segment (page) length error
User mode
Segment (Page) 1
which is the first thing that's wrong - neither the instruction that's
_supposed_ to be there (next), nor the one that's _actually_ there, contains
any reference to segment 1!
The _actual_ code it's trying to execute is:
> 171600: 016162 004767 000224 000414 006700 006152 006702 006144
(Per UISA0, text base is 0161400, plus a PC of 010210, gives us 0171610, which
is right in the middle there.) That does not, alas, look anything _at all_
like what's _supposed_ to be there, which is:
010200: 110024
10400 mov r4,r0
167 jmp 10226 (cret)
16
PC-> 10500 mov r5,r0 (start of CSV)
10605 mov sp,r5
10446 mov r4,-(sp)
10346 mov r3,-(sp)
So somehow the command (at least, this part of it - Fritz is going to check on
the first few instructions, but I'm pretty sure they will be OK) has gotten
read in wrong - but that's the least of our problems! 06700 is 'SXT R0', and
neither that nor 'MOV R5, R0' can _possibly_ cause an MM violation - least of
all one on segment 1 (this code is in segment 0)!
I could see there having been an error reading in the command binary (e.g. maybe
the RK11 has an issue), but WTF is happening here?
Just to make things triply confusing, R5 contains trash! The 'JSR R5, CSV' _should
have put the old PC in R5; but that call to CSV is at 030, so R5 _should_ contain
034, not 0444.
Needless to say, this is a real head-scratcher. What's confusing the heck out
of me are the three separate issues, all happening together - R5 contains
junk, the spurious (?) MM trap, etc.
The bad command binary in main memory could be caused by any number of things:
to get it, Unix reads file system blocks off the disk into buffers in low
memory, and then writes them out to the user's memory with MTPI. So an RK11
glitch could be doing it, but also a KT11 problem, etc.
I'm having a hard time seeing a common thread here - maybe a KT11 issue? But
how would that cause R5 to contain trash? That should only involve the KB11.
And the JSR R5, CSV must have been executed more-or-less OK, otherwise how did
it wind up at CSV?
I was wondering if some noise could be causing it - some sort of pattern
sensitiity - but how is it bashing R5 _and_ causing a spurious MM trap? That's
some glitch!
Most of the data above (e.g. SSR contents at trap time) has been re-checked,
and Fritz is going to check the rest (e.g. actual main memory contents for the
start of the code, and the user's stack - to check that the process' core dump
worked OK - although given the consistent stack contents, I'm expecting those
to be good).
I suggested to him that the time had come to apply the logic analyzer; I'd
love to see (from the IR in the CPU) the instruction that faults, and where it
came from. And also what the bus cycle is that's causing the fault; is it the
instruction fetch (possibly) or something that instruction is trying to do?
Does anyone have any comments/insight that could help work out what's going on
here? Or suggestions on things to look at? If so, thanks!
Noel
I don't get replies from here yet, so I have seen no replies to my posts,
nor the posts themselves.
There is a shop that has been in biz for over 25 years that is closing in
California.
I asked for anything old Apple, Sun, HP, IBM, and any old keyboards.
She will call me back tomorrow. She never dealt with the off brands, just
major maintenance contracts.
Cindy Croxton
Electronics Plus
1613 Water Street
Kerrville, TX 78028
830-370-3239 cell
sales at elecplus.com
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
Unfortunately, they have already scrapped everything! They were distributors
of old HP and IBM hardware.
Cindy Croxton
Electronics Plus
1613 Water Street
Kerrville, TX 78028
830-370-3239 cell
sales at elecplus.com
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
If anyone out there needs an EIA distribution panel to go with their
DZ11, here:
https://www.ebay.com/itm/321225351590
are some of the 8-line ones (as used in the later modular back panel
system). The seller (Efi) is good people.
Noel
Daniel Fecteau
6025 Arthur sauv?
Mirabel, Quebec
J7N 2W4
TEL: 450-969-1616 ext 101
Mail: save at savesysteme.com
He has a variety of Model M 122 key keyboards. Contact him if you are
interested.
Not affiliated with seller, etc.
Cindy Croxton
Electronics Plus
1613 Water Street
Kerrville, TX 78028
830-370-3239 cell
sales at elecplus.com
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
> From: Cindy Croxton
> I changed email providers, and received no emails for a week. If you
> tried to contact me, please ask again!
Perhaps 'test' was not an optimal Subject: line - a lot of people think that
flags a message they can ignore, and not even look at - which was not what
you wanted! :-)
Noel
Hi Al.
$ 60, plus shipping ?
I live close to Paris, France. Zip Code : 77310
( Payment through Paypal seems the only way ?? )
On an other subject, did you see my previous cctalk post on imaging HP 1000 L series Eprom ? Any interest ?
Best regards
Gerard
I own a HP 1000 L series 200
Cards are badly corroded at connectors level BUT some other parts are in very good shape.
I can offer,
HP "special SoS" processors : 1AA6-60004, 1AC5, 1AB5, 1AF5 ( - 60001 )
The set of (3) Eprom ....
I cannot image them, but I am willing to send them to someone that will do it.
Al. may be ?
The power supply seems in pristine state. Just have to check more closely if needed.
Other parts ? .... just ask.
"fee" : Eprom, free
Processors : shipping cost
Power supply : shipping cost + a little more for packing material and my time.
I am in France, close to Paris.
I changed email providers, and received no emails for a week. If you tried
to contact me, please ask again!
Cindy Croxton
Electronics Plus
1613 Water Street
Kerrville, TX 78028
830-370-3239 cell
sales at elecplus.com
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus