Pipelining and Dec Jupiter thoughts....

7 May 2021

http://bitsavers.informatik.uni-stuttgart.de/pdf/dec/pdp10/
Has a *lot* of stuff. As I start cranking up my brain to drag BLT out of
the shed and start working on it I'm finding this stuff to be a serious
refresher.
C
On 5/7/2021 2:06 AM, Lee Courtney wrote:
...
  Chris - great and interesting overview. Do you have a
reading list for
 more details? Thanks!
 Lee Courtney
 On Thu, May 6, 2021 at 7:35 PM Chris Zach via cctalk
 <cctalk at classiccmp.org <mailto:cctalk at classiccmp.org>> wrote:
  Sort of.? But while a lot of things happen in
parallel, out of      order, speculatively, etc., the programming model exposed by
the
     hardware still is the C sequential model.? A whole lot of logic is
     needed to create that appearance, and in fact you can see that all
     the way back in the CDC 6600 "scoreboard" and "stunt box".? Some
     processors occasionally relax the software-visible order, which
     tends to cause bugs, create marketing issues, or both -- Alpha
     comes to mind as an example.
     Interesting to see this.
     I've been reading a lot recently about the Jupiter/Dolphin project
     and
     the more I read the more I understand why it just could not be
     done. At
     the time (and to an extent even now) the only way to really improve a
     system's performance was to pipeline the processor, and the Pdp10
     instruction set just wasn't easy to do that with.
     They had a great concept: An Instruction fetch/decode system
     (IBOX), an
     execution engine (EBOX), the obligitory vector processor or FPU
     (HBOX)
     and of course the memory system (MBOX). Break the process up into
     steps
     and have the parts all work in parallel to boost performance.
     Unfortunately they started to find way too many cases where an
     indirect
     instruction would be fetched that would be based on the AC, which was
     being changed by another instruction in the EBOX. This would blow out
     all the prefetched work in the pipe, forcing the IBOX to do a costly
     reload.
     Likewise branch prediction couldn't be done well because most
     branches
     and skips depended on the value in the AC which was once again
     usually
     being modified in the EBOX down the pipe. As soon as it was
     modified the
     pipe had to be flushed and reloaded. It looks like they tried to put
     that logic into the IBOX to catch these issues, but that resulted
     in a
     flat processor that wasn't going to benefit from any parallelism, an
     endless series of bugs, and an IBOX that was pretty much running with
     its own EBOX.
     It got worse when they realized that the Extended memory segments
     in the
     2060 architecture totally wrecked the concept of an instruction
     decoder/execution box. There were just too many places where an
     indirect
     instruction to another section which was then based on the AC's would
     result in Ibox tossing the queue and invalidating the translation
     buffers. Increasing the translation buffer helped (I think that's
     one of
     the things they did on the final 2065 to make it faster) but they
     couldn't make that big and fast enough. I guess an indirect jump
     instruction based on comparing the AC to an indirect address
     pointing to
     an extended segment would be enough to make any decoder just cry.
     It's sad to read, you can almost see then realizing it was doomed.
     The
     Foonly F1 was a screamer, but it was basically the KA10
     instruction set
     and couldn't run extended memory segments like the 2060. And when
     they
     tried to do the same thing with the F4 it came out to be a little
     slower
     than a 2060. I used to think they put only one extended segment in
     the
     2020 to cripple the box, but maybe they started running into the same
     problem and ran out of microcode space to try and address it.
     Couple this with the fact that much of the 20 series programs were
     built
     in assembler (and why not, it was an amazing thing to program) and
     you
     just had too many programs with cool bespoke code that would totally
     trash a pipeline. Fixing compilers to order instructions properly
     could
     have worked, but people just wrote in assembler it wasn't going to
     happen and they weren't about to re-code their app to please the new
     scheduler God.
     The VAX instruction set was a lot less beautiful, but could be
     pipelined
     easier especially with the dedicated MMU so they took the people and
     pipelined the hell out of the 780 resulting in the nifty 8600/8650
     and
     later the 8800's. Dec learned their lesson when they built Alpha, and
     even Intel realized that their instruction set needed to be pipelined
     for the Pentium Pro and above processors.
     Ah well. I don't think it was evil marketing or VAX monsters that
     killed
     the KC10, it was simply the fact that the amazing instruction set
     couldn't be pipelined to make it more efficient for hardware and the
     memory management system wasn't as efficient as the pdp11/Vax MMU
     concept.
 --
 Lee Courtney
 +1-650-704-3934 cell

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Pipelining and Dec Jupiter thoughts....