On Thu, Jul 23, 2015 at 7:25 PM, Andrew Jones <andrew at jones.ec> wrote:
The photos of your home-made QUIP socket were really
cool.
I haven't made any QUIP sockets yet, though I plan to do so. The
photos are of a footprint adapter which has an actual 3M 3534 QUIP
socket plugged into machined-pin socket strips.
I reached out today because I've been wanting to
hack on a 432 emulator for
some years, ever since I found some contemporary textbooks about the
architecture.
While I consider the Architecture Reference Manuals to be quite good,
they do still leave uncertainty about some details that will be
necessary to make an accurate emulator. I hope to use the FPGA-based
setup to analyze how the release 1.0 components actually handle those
details.
One tricky thing about doing an emulator, if you wanted to actually
take advantage of a multicore/SMP system to simulate multiple 432
processors (either GDP or IP) is that the memory system is expected to
provide atomicity of all reads and writes, of sizes of 1, 2, 4, 6, 8,
or 10 bytes, with no alignment restrictions. This isn't just for
interlocked RMW accesses; when one processor is doing a write, a
second processor reading the same addresses or an overlapping address
range is never allowed to see the first processor's write in a
partially completed state.
As far as I can tell, on the x86 there is no simple way to guarantee
that kind of memory access semantics for a 10-byte access. CMPXCHG16B
can do it for an unaligned 8-byte access, but the memory operand of
the CMPXCHG16B instruction has to be 16-byte aligned, which doesn't
allow it to cover all possible alignments of 10-byte data. Intel's
Transactional Memory eXtension, which only works correctly on very
recent processors, could be used to solve the problem.
Unfortunately, object code in good condition is very
hard to
find.
I haven't found any that is actually usable or even in a format
conducive to analysis. That's why I'm working on my own image builder,
including assembler functionality. It takes an XML description of a
432 image and compiles it into a binary image. If I ever get that
actually working, then I'll think about a compiler for a normal
programming language of some sort (or a subset thereof), producing the
XML file for the image builder as an intermediate step.
I was wondering if I could have a copy of your
microcode dump.
Email sent.
being microcode, it is not object code in the most
meaningful sense of the
word, but it's a lot more than I've been able to locate so far.
As is typical with microcode, it's far more difficult to understand
than a normal machine language. The only published details are from a
patent filed more than two years prior to the iAPX 432 public release,
and there were some non-trivial architectural and microarchitectural
changes in that interval, such that the microinstruction descriptions
in the patent don't completely match those of the release 1.0
components. Still, without the patent, trying to understand the
microcode would be a near hopeless task.
The published papers on the design of the 43201 variously claim that
it has 3.5K, 3.75K or 4K words (by 16 bits) of microcode ROM. The
release 1.0 43201 actually has 4K, so the 3.5K and 3.75K numbers were
probably referring to the amount actually used in prerelease chips,
rather than the physical layout (maximum words possible).
The entry point at address 0x000 is used for initialization when INIT/
signal is deasserted. The first nine instructions starting there are
NOPs. It is unclear whether there is any hardware reason that there
need to be multiple NOPs there; due to pipelining[*] I could perhaps
imagine a need for two, but not nine. It may be that there were nine
words of ROM left over, and they decided to put them at the lowest
address rather than the highest.
In general the microcode is NOT the top-level control of the GDP.
Rather, microcode routines get invoked by the hardware for specific
tasks, such as some but not all of the macroinstructions, fault
handling, and aspects of memory management. The ROM entry point
addresses are in PLAs. I don't think there's any way to dump the PLAs
other than decapping the chip, but since it is vertical microcode
using a PC that normally increments, it's easy to find the targets of
microbranches and subroutine calls, and also addresses that cannot be
reached without a hardware-provided address.
Eric
* There is a delay slot after change of control flow
microinstructions, as it common for both microcode and some RISC code.