Pipelining and Dec Jupiter thoughts....
Chris Zach
cz at alembic.crystel.com
Fri May 7 04:55:24 CDT 2021
http://bitsavers.informatik.uni-stuttgart.de/pdf/dec/pdp10/
Has a *lot* of stuff. As I start cranking up my brain to drag BLT out of
the shed and start working on it I'm finding this stuff to be a serious
refresher.
C
On 5/7/2021 2:06 AM, Lee Courtney wrote:
> Chris - great and interesting overview. Do you have a reading list for
> more details? Thanks!
>
> Lee Courtney
>
> On Thu, May 6, 2021 at 7:35 PM Chris Zach via cctalk
> <cctalk at classiccmp.org <mailto:cctalk at classiccmp.org>> wrote:
>
>
> > Sort of. But while a lot of things happen in parallel, out of
> order, speculatively, etc., the programming model exposed by the
> hardware still is the C sequential model. A whole lot of logic is
> needed to create that appearance, and in fact you can see that all
> the way back in the CDC 6600 "scoreboard" and "stunt box". Some
> processors occasionally relax the software-visible order, which
> tends to cause bugs, create marketing issues, or both -- Alpha
> comes to mind as an example.
>
> Interesting to see this.
>
> I've been reading a lot recently about the Jupiter/Dolphin project
> and
> the more I read the more I understand why it just could not be
> done. At
> the time (and to an extent even now) the only way to really improve a
> system's performance was to pipeline the processor, and the Pdp10
> instruction set just wasn't easy to do that with.
>
> They had a great concept: An Instruction fetch/decode system
> (IBOX), an
> execution engine (EBOX), the obligitory vector processor or FPU
> (HBOX)
> and of course the memory system (MBOX). Break the process up into
> steps
> and have the parts all work in parallel to boost performance.
>
> Unfortunately they started to find way too many cases where an
> indirect
> instruction would be fetched that would be based on the AC, which was
> being changed by another instruction in the EBOX. This would blow out
> all the prefetched work in the pipe, forcing the IBOX to do a costly
> reload.
>
> Likewise branch prediction couldn't be done well because most
> branches
> and skips depended on the value in the AC which was once again
> usually
> being modified in the EBOX down the pipe. As soon as it was
> modified the
> pipe had to be flushed and reloaded. It looks like they tried to put
> that logic into the IBOX to catch these issues, but that resulted
> in a
> flat processor that wasn't going to benefit from any parallelism, an
> endless series of bugs, and an IBOX that was pretty much running with
> its own EBOX.
>
> It got worse when they realized that the Extended memory segments
> in the
> 2060 architecture totally wrecked the concept of an instruction
> decoder/execution box. There were just too many places where an
> indirect
> instruction to another section which was then based on the AC's would
> result in Ibox tossing the queue and invalidating the translation
> buffers. Increasing the translation buffer helped (I think that's
> one of
> the things they did on the final 2065 to make it faster) but they
> couldn't make that big and fast enough. I guess an indirect jump
> instruction based on comparing the AC to an indirect address
> pointing to
> an extended segment would be enough to make any decoder just cry.
>
> It's sad to read, you can almost see then realizing it was doomed.
> The
> Foonly F1 was a screamer, but it was basically the KA10
> instruction set
> and couldn't run extended memory segments like the 2060. And when
> they
> tried to do the same thing with the F4 it came out to be a little
> slower
> than a 2060. I used to think they put only one extended segment in
> the
> 2020 to cripple the box, but maybe they started running into the same
> problem and ran out of microcode space to try and address it.
>
> Couple this with the fact that much of the 20 series programs were
> built
> in assembler (and why not, it was an amazing thing to program) and
> you
> just had too many programs with cool bespoke code that would totally
> trash a pipeline. Fixing compilers to order instructions properly
> could
> have worked, but people just wrote in assembler it wasn't going to
> happen and they weren't about to re-code their app to please the new
> scheduler God.
>
> The VAX instruction set was a lot less beautiful, but could be
> pipelined
> easier especially with the dedicated MMU so they took the people and
> pipelined the hell out of the 780 resulting in the nifty 8600/8650
> and
> later the 8800's. Dec learned their lesson when they built Alpha, and
> even Intel realized that their instruction set needed to be pipelined
> for the Pentium Pro and above processors.
>
> Ah well. I don't think it was evil marketing or VAX monsters that
> killed
> the KC10, it was simply the fact that the amazing instruction set
> couldn't be pipelined to make it more efficient for hardware and the
> memory management system wasn't as efficient as the pdp11/Vax MMU
> concept.
>
>
>
> --
> Lee Courtney
> +1-650-704-3934 cell
More information about the cctech
mailing list