yet another pdp-11 in fgpa

25 Jun 2010

Hi, Walter.

"Walter F.J. Mueller" <w.f.j.mueller at gsi.de> wrote:

...
  Johnny Billquist <bqt at softjar.se> wrote:
  > "Walter F.J. Mueller"
<W.F.J.Mueller at gsi.de> wrote:
 > I've also implemented a PDP-11 on an FPGA. It is a full 11/70 with
 > split I&D, MMU and cache. No FPP so far. Available peripherals are so
 > far DL11, LP11, KW11L, PC11, and RK11. All I/O is channeled over via
 > 'remote-register-interface' onto a single bi-directional byte stream
 > interface, so the FPGA board needs a backend PC with a server program
 > to handle the I/O requests.  
  Any plans on the FPP? It would be really nice and
useful to have.  
 Hi Johnny,

 sure, an FPP is on the 'todo-list', but it doesn't have the highest
 priority. After having put the first version on OpenCores I'd like to
 add a trace/debug unit (allowing hardware breakpoints ect), and add
 a few more peripherals, especially larger disks. Currently I have
 only an RK11 controller, good enough for proof-of-principle, but
 not enough for real usage. 
Disks are definitely a good thing. And as usual, I'll advocate MSCP. 
Even though it's not the simplest, it's simply just the best. :-)

But FPP is among the most important things in there as well, I'd say. 
Lots of software who won't be happy without it.

...
   As for traps
and double errors, feel free to ask. I don't know if I have
 all the answers, but I might be able to figure them out. Besides, I also
 have access to one (or three) functional 11/70 machines.  
 I've tested much of the implementation against simh and xxdp's, but there
 are still a few loose ends regarding corner cases. It be great to run a
 few test programs on simh, a real 11/70, and my fpga implementation,
 called now w11a. 
Feel free to talk with me more offlist, and we can see when we can 
schedule some testing on real hardware.

...
   The 11/53 is a
really slow machine. Not that helpful to compare with. But you
 seem to push a nice number anyway. But 50MHz... The J11 in an 11/9x machine
 runs at 20 MHz, which would suggest that you should only be able to push about
 2.5 times the performance, unless you do some more clever tricks.
 (The 11/9x machine runs all memory as cache.)  
 I know, but the 11/53 is the only pdp-11 where I know the Unix Benchmark
 and thus the Dhrystone results, so it became the reference.

 Even though my implementation is quite different from the organization of
 the original 11/70, it has essentially the same instruction timing as a
 11/45 or 11/70 when expressed in clock cycles. The 11/45 or 11/70 CPU's
 ran with a 150 ns clock period (ignoring clock stretching here), thus a
 6.7 MHz clock. A register-register operation takes 2 cycles, a
 "mov r0,(r1)+" for example 5 cycles.

 Because the cpi (cycles-per-instruction) for 11/70 and the w11a is very
 similar and both have a good cache the w11a should simply be 50/6.7 or a
 factor 7.5 faster than a 11/70.

 The 11/70 and the w11a have some pipelining, instruction fetch and
 decode/operate can overlap for register destination instructions. The
 J11 is more pipelined, here fetch, decode, and operate stage can overlap.
 Therefore register-register instructions take 1 cycle in the best case,
 a "mov r0,(r1)+" for example 3 cycles.

 Therefore a 50 MHz w11a will not be 2.5 times faster than a 20 MHz J11,
 maybe just 1.5 times faster. The w11a is intentionally implemented in a
 quite simple and conservative way, prime goal was to get it right and
 working, and not to get it fast. 
Good thinking.
But I'm surprised by some numbers here.
The J11 at 20 MHz is only slightly faster than an 11/70. In fact, if you 
can throw the 11/70 into running all from cache, it might even be 
slightly faster than an 11/9x.
Or so I seem to remember from looking at the numbers back when I last 
was digging into this.

Maybe I'm mixing some numbers up here... What I do remember for sure is 
that the 11/9x machines run at 20 MHz, and that they are not more than 
maybe 1.2 times the speed of an 11/70 in general.

...
  At some later time maybe I'll try a really fast
design, with separate
 instruction and data caches and significantly more parallelism than
 the J11 had. 
Hmm. I wonder if that might cause headaches? There might be code out 
there that require your i-cache and d-cache to be consistent with each 
other.

...
   IIST is needed
for RSX to be happy (the only OS that supports the 11/74),
 and you also need to implement parts of the memory bus behaviour with
 interlocking. You can ignore the MK11 box CSRs, even though it will look
 a little funny, but you do need separate DL11s for each CPU core, along with
 the rest of the I/O  bus, or else things will probably not work. The 11/74
 is a shared memory machine, but not shared I/O bus.  
 I'm fully aware of this, the MP version will have one I/O bus per CPU and
 a shared memory and asrb interlock, and caches with proper cache coherency. 
Yes. But what I was thinking of was the fact that at a level below this, 
you have the CPU that be issuing read-modify-write cycles to memory, and 
those needs to be interlocked to memory.
At a higher level, the 11/70 was modified for asrb to always bypass 
cache, and then you had two other ways to bypass cache as well. But 
bypassing cache is only half the problem, as you also need to make sure 
some memory operations are atomic, as seen from other cpus.
But if you know a thing or two about cpu and memory design (which it 
would appear you do), then you probably understand the problem already.

By the way. You don't have to worry about cache coherency. The PDP-11/74 
do not do that. Cache coherency is managed by software on the PDP-11 
(well, in RSX, since that's the only system that supports the hardware). 
In short, the real hardware do not implement any sort of cache coherency 
in hardware.

...
  It's true that RSX is the only OS that supports an
11/74. Unfortunately I
 don't have an RSX11-M plus license. So the plan is to patch 2.11BSD to support
 an MP system. Sounds like a long shot, but looking into the kernel sources
 I concluded that a funneling or 'big kernel lock' type MP support seems to
 be quite feasible. Will not scale well, but for a 'dual-core' this is likely
 good enough. 
It's definitely doable. However, it is not that simple.
The reason why DEC choose RSX as the OS for implementing multi-processor 
support is that it does not, in general, use interrupt priority levels 
to serialize access to data, protect sections of code, or implement locks.
Unix do. So, in short, everywhere where the interrupt priority is 
changed, you potentially need to change the code, since another 
processor might still get interrupts at that level, and do things you 
thought you had locked out.

But I see your problem. It would be great if we could solve the 
situation with RSX at some point...

	Johnny

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

yet another pdp-11 in fgpa