On Sun, May 17, 2015 at 06:16:22PM -0400, Noel Chiappa wrote:
Even if
E11's mP feature is officially unsupported, it was a *crazy*
amount of work
What made it so much work? (Just curious about the technical aspects...)
Everything Johnny said is true -- the fact that the IIST docs hadn't
surfaced at the time was the biggest single problem, and dancing around
the race-condition thing with the 16-bit spin counter was a big deal
(especially when emulating a mP PDP-11 on a single-CPU PC -- so the other
CPU(s) won't respond to pokes for many milliseconds and I had to kludge a
way for CPUs to unknowingly wait for each other, which I'll bet is the
cause of the mP stability problems), but it helped that the RSX sources
are commented and I have IIST diag listings on fiche (and the binaries are
on XXDP -- for something that wasn't officially released or supported, the
11/74 sure shows up in a lot of places, including the 11M+ doc set).
And yes, sorting out booting through the IIST was quite a piece of magic.
But the hugest amount of busywork was just reworking every shred of CPU
emulation code (including the code generator that recompiles the most common
instruction cases from scripts after each SET CPU command and repopulates
the 64K dispatch table) to store all CPU/MMU/clock/SR/EAE/etc. context in a
block (it had previously scattered it all in global static storage for
everything but now it lugs tons of stuff around with each CPU), and find
all the places that have to make sure they have the pointer to that block.
Even worse was rigging all the device emulations to have a concept of which
bus they're on (for things like "set dub: bus=b" to move the second MSCP
controller onto CPB's Unibus, etc.). That meant massaging a *ton* of code.
While I was at it I modified the concept of "default" CSRs/vectors so they
start over from scratch on each bus (Johnny was great with bouncing ideas
back and forth on that and I'm really happy with how it worked out). As
always, you can use SET ddc: CSR=xxxxxx VEC=yyy to move them wherever you
want, but the defaults are sensible (including recalculating the "floating"
CSRs/vectors on any change) so things land where they belong 99% of the
time, within each CPU (as opposed to having CPA's instance of a controller
at the usual address but CPB's controller is at another address just
because it's the second controller, even though it's *CPB's* first
controller). That also meant allowing the "DEFAULT" keyword for CSR=
and VEC=, and splitting out the tables so that E11 will still remember
the default values even after you've been monkeying with them (since they
might still apply on a different CPU).
I have yet to mechanize Unibi beyond D. 11M+ supports up to 16 of them
(the "URM" field in CON is a bitmap of them), with the last potential 12
being on the other sides of DT03/DT07 bus switches. Um, I'll get to it!
It'll help if I get to understand BSDRV (there's a place where it fiddles
with memory floating above the top of the stack, which I can't imagine is
intentional). I know there's no practical need (the point of those
switches is to dynamically move peripherals off of failed CPUs and onto
healthy ones w/o shutting the system down, and that makes no sense under
emulation -- the whole idea of the 11/74 was not speed but minimizing
downtime) but as always that's not the point! Similarly, there's no MKA11
emulation -- it's Just Memory. Which works fine (no need to move it around
to avoid failing CPUs) but again it'd be nice to emulate that just for
completeness.
There's also all kinds of fiddliness with having multiple (per-CPU)
versions of the event-handling queues (I/O complete, keystroke received,
timer expired, etc. etc. etc.), and ways to pass off requests (either by
enqueueing them to other CPUs or by calling a routine that appears to
"return" in another thread). So all the places that used to make a note
to have a grown-up deal with something for us, now have to know *which*
grown-up they want to do it.
Later adding the PDP-11/45 version of mP was even more fun and not nearly
so hard. 99% of the 11/74 stuff applied; it just needed to have a way to
have separate RAM space for each CPU, and it needed the FASTBUS: pseudo-
device since on a real PDP-11/45 it's possible to uncable the DMA interface
to one processor's local memory bus from that processor's Unibus and insert
it into the Unibus of a different processor (very weird but used in some
flight simulators). So that's just a RAM device, but since the 11/45 does
have cache coherence, *every* write or read-modify-write instruction to
that RAM has to be protected by a lock so they won't trip each other up
(which ironically makes it much slower than regular non-"Fast" RAM, but
still faster than a real 11/45). And I added the IPL: device since flight
simulators like to have a pair of DR11Cs or DR11Bs/DR11Ws linking the two
CPUs (and doing an all-software interprocessor link is easy).
The 11/45 mP can work totally independently too (you don't have to bother
with FASTBUS: memory blocks or IPLs connecting the processors). At one
point I booted RSTS, RSX11M+, and RT-11 simultaneously on a *DOS* quad-core
PC (so no time-slicing at all -- and all I/O requests are passed to/from
the boot processor where the DOS calls happen), just so I could have a
giggle fit over how stupid that was. I've found that PC MPS BIOSes are
very inconsistent though (so what gets one motherboard to kick the other
cores to life won't work on another, so DOS isn't the place to play with
mP, even though it's hilarious when it works) and I haven't tackled ACPI
yet (it's huge).
But even though it's not worth using (and mostly works only on particular
motherboards that I've tested/debugged the code on), adding support for SMP
PCs on the DOS version was another giant huge slab of work. What I'm doing
is definitely wrong (keeping the 8259As in charge of interrupts instead of
the IOAPICs) but I have no choice since it's important that DOS (and any
loaded drivers or TSRs) have no idea what's afoot, since it's not qualified
to deal with it. So all hardware interrupts are handled on the boot
processor (in case they end up being handled in V86 mode by a driver
outside E11, which will want to send an end-of-interrupt command to the
8259A), and all DOS/BIOS/ASPI/DPMI/Packet Driver calls are made there too
(which means there's a queueing system for moving requests from other CPUs
to the boot processor and sending the results back).
A funny side-effect of that is that DOS file I/O calls made on behalf of
CPA are blocking as usual (*why* don't supposedly modern OSes have .READC
or .WRITC for files?), but CPB and later effectively have non-blocking file
I/O since life goes on while they're waiting for CPA to finish doing its
thing and call back. So I haven't benchmarked this but it makes sense
that a single-processor system running on CPB (with CPA sitting halted)
would have better throughput than the same system running in uniprocessor
mode (I'm talking DOS/stand-alone here -- on the other OSes it uses helper
threads to simulate non-blocking file I/O, so there's no difference between
CPA and CPB/later).
ANYWAY so it was a crazy amount of busywork, a million miles beyond just
having extra copies of R0-5/SP/PC. And originally I thought it'd have to
be a build option in a separate version so as not to slow down the
main/official version of E11 just to support a feature virtually no one
uses, but it turns out that on current x86es, references to [EBP+nn] are
actually slightly *faster* than absolute references to DS:[nnnnnnnn] (no
matter what Intel tells us, it always seems to boil down to the # of opcode
bytes), so there's no speed penalty at all in using the SMP code for a UP
emulation.
John Wilson
D Bit