Pipelining and Dec Jupiter thoughts....

Fri May 7 01:06:44 CDT 2021

Chris - great and interesting overview. Do you have a reading list for more
details? Thanks!

Lee Courtney

On Thu, May 6, 2021 at 7:35 PM Chris Zach via cctalk <cctalk at classiccmp.org>
wrote:

>
> > Sort of.  But while a lot of things happen in parallel, out of order,
> speculatively, etc., the programming model exposed by the hardware still is
> the C sequential model.  A whole lot of logic is needed to create that
> appearance, and in fact you can see that all the way back in the CDC 6600
> "scoreboard" and "stunt box".  Some processors occasionally relax the
> software-visible order, which tends to cause bugs, create marketing issues,
> or both -- Alpha comes to mind as an example.
>
> Interesting to see this.
>
> I've been reading a lot recently about the Jupiter/Dolphin project and
> the more I read the more I understand why it just could not be done. At
> the time (and to an extent even now) the only way to really improve a
> system's performance was to pipeline the processor, and the Pdp10
> instruction set just wasn't easy to do that with.
>
> They had a great concept: An Instruction fetch/decode system (IBOX), an
> execution engine (EBOX), the obligitory vector processor or FPU (HBOX)
> and of course the memory system (MBOX). Break the process up into steps
> and have the parts all work in parallel to boost performance.
>
> Unfortunately they started to find way too many cases where an indirect
> instruction would be fetched that would be based on the AC, which was
> being changed by another instruction in the EBOX. This would blow out
> all the prefetched work in the pipe, forcing the IBOX to do a costly
> reload.
>
> Likewise branch prediction couldn't be done well because most branches
> and skips depended on the value in the AC which was once again usually
> being modified in the EBOX down the pipe. As soon as it was modified the
> pipe had to be flushed and reloaded. It looks like they tried to put
> that logic into the IBOX to catch these issues, but that resulted in a
> flat processor that wasn't going to benefit from any parallelism, an
> endless series of bugs, and an IBOX that was pretty much running with
> its own EBOX.
>
> It got worse when they realized that the Extended memory segments in the
> 2060 architecture totally wrecked the concept of an instruction
> decoder/execution box. There were just too many places where an indirect
> instruction to another section which was then based on the AC's would
> result in Ibox tossing the queue and invalidating the translation
> buffers. Increasing the translation buffer helped (I think that's one of
> the things they did on the final 2065 to make it faster) but they
> couldn't make that big and fast enough. I guess an indirect jump
> instruction based on comparing the AC to an indirect address pointing to
> an extended segment would be enough to make any decoder just cry.
>
> It's sad to read, you can almost see then realizing it was doomed. The
> Foonly F1 was a screamer, but it was basically the KA10 instruction set
> and couldn't run extended memory segments like the 2060. And when they
> tried to do the same thing with the F4 it came out to be a little slower
> than a 2060. I used to think they put only one extended segment in the
> 2020 to cripple the box, but maybe they started running into the same
> problem and ran out of microcode space to try and address it.
>
> Couple this with the fact that much of the 20 series programs were built
> in assembler (and why not, it was an amazing thing to program) and you
> just had too many programs with cool bespoke code that would totally
> trash a pipeline. Fixing compilers to order instructions properly could
> have worked, but people just wrote in assembler it wasn't going to
> happen and they weren't about to re-code their app to please the new
> scheduler God.
>
> The VAX instruction set was a lot less beautiful, but could be pipelined
> easier especially with the dedicated MMU so they took the people and
> pipelined the hell out of the 780 resulting in the nifty 8600/8650 and
> later the 8800's. Dec learned their lesson when they built Alpha, and
> even Intel realized that their instruction set needed to be pipelined
> for the Pentium Pro and above processors.
>
> Ah well. I don't think it was evil marketing or VAX monsters that killed
> the KC10, it was simply the fact that the amazing instruction set
> couldn't be pipelined to make it more efficient for hardware and the
> memory management system wasn't as efficient as the pdp11/Vax MMU concept.
>
>

-- 
Lee Courtney
+1-650-704-3934 cell