yet another pdp-11 in fgpa - test-drb@ccmp.vtda.org

24 Jun 2010

Johnny Billquist <bqt at softjar.se> wrote:
...
  > "Walter F.J. Mueller" <W.F.J.Mueller
at gsi.de> wrote:
 > I've also implemented a PDP-11 on an FPGA. It is a full 11/70 with
 > split I&D, MMU and cache. No FPP so far. Available peripherals are so
 > far DL11, LP11, KW11L, PC11, and RK11. All I/O is channeled over via
 > 'remote-register-interface' onto a single bi-directional byte stream
 > interface, so the FPGA board needs a backend PC with a server program
 > to handle the I/O requests. 
...
  Any plans on the FPP? It would be really nice and
useful to have. 
Hi Johnny,

sure, an FPP is on the 'todo-list', but it doesn't have the highest
priority. After having put the first version on OpenCores I'd like to
add a trace/debug unit (allowing hardware breakpoints ect), and add
a few more peripherals, especially larger disks. Currently I have
only an RK11 controller, good enough for proof-of-principle, but
not enough for real usage.

...
  As for traps and double errors, feel free to ask. I
don't know if I have
 all the answers, but I might be able to figure them out. Besides, I also
 have access to one (or three) functional 11/70 machines. 
I've tested much of the implementation against simh and xxdp's, but there
are still a few loose ends regarding corner cases. It be great to run a
few test programs on simh, a real 11/70, and my fpga implementation,
called now w11a.

...
  The 11/53 is a really slow machine. Not that helpful
to compare with. But you
 seem to push a nice number anyway. But 50MHz... The J11 in an 11/9x machine
 runs at 20 MHz, which would suggest that you should only be able to push about
 2.5 times the performance, unless you do some more clever tricks.
 (The 11/9x machine runs all memory as cache.) 
I know, but the 11/53 is the only pdp-11 where I know the Unix Benchmark
and thus the Dhrystone results, so it became the reference.

Even though my implementation is quite different from the organization of
the original 11/70, it has essentially the same instruction timing as a
11/45 or 11/70 when expressed in clock cycles. The 11/45 or 11/70 CPU's
ran with a 150 ns clock period (ignoring clock stretching here), thus a
6.7 MHz clock. A register-register operation takes 2 cycles, a
"mov r0,(r1)+" for example 5 cycles.

Because the cpi (cycles-per-instruction) for 11/70 and the w11a is very
similar and both have a good cache the w11a should simply be 50/6.7 or a
factor 7.5 faster than a 11/70.

The 11/70 and the w11a have some pipelining, instruction fetch and
decode/operate can overlap for register destination instructions. The
J11 is more pipelined, here fetch, decode, and operate stage can overlap.
Therefore register-register instructions take 1 cycle in the best case,
a "mov r0,(r1)+" for example 3 cycles.

Therefore a 50 MHz w11a will not be 2.5 times faster than a 20 MHz J11,
maybe just 1.5 times faster. The w11a is intentionally implemented in a
quite simple and conservative way, prime goal was to get it right and
working, and not to get it fast.

At some later time maybe I'll try a really fast design, with separate
instruction and data caches and significantly more parallelism than
the J11 had.

...
  IIST is needed for RSX to be happy (the only OS that
supports the 11/74),
 and you also need to implement parts of the memory bus behaviour with
 interlocking. You can ignore the MK11 box CSRs, even though it will look
 a little funny, but you do need separate DL11s for each CPU core, along with
 the rest of the I/O  bus, or else things will probably not work. The 11/74
 is a shared memory machine, but not shared I/O bus. 
I'm fully aware of this, the MP version will have one I/O bus per CPU and
a shared memory and asrb interlock, and caches with proper cache coherency.

It's true that RSX is the only OS that supports an 11/74. Unfortunately I
don't have an RSX11-M plus license. So the plan is to patch 2.11BSD to support
an MP system. Sounds like a long shot, but looking into the kernel sources
I concluded that a funneling or 'big kernel lock' type MP support seems to
be quite feasible. Will not scale well, but for a 'dual-core' this is likely
good enough.

     Walter