On Jun 10, 2024, at 12:18 PM, Joshua Rice via cctalk
<cctalk(a)classiccmp.org> wrote:
On 10/06/2024 05:54, dwight via cctalk wrote:
No one is mentioning multiple processors on a
single die and cache that is bigger than most systems of that times complete RAM.
Clock speed was dealt with clever register reassignment, pipelining and prediction.
Dwight
Pipelining has always been a double edged sword. Splitting the instruction cycle into
smaller, faster chunks that can run simultaneously is a great idea, but if the actual
instruction execution speed gets longer, failed branch predictions and subsequent pipeline
flushes can truly bog down the real-life IPS. This is ultimately what led the NetBurst
architecture to be the dead-end it became.
RISC can do pipelining much more easily (as Cray first demonstrated around 1964, with the
CDC 7600). For one thing, "bypass" is doable, and widely used, in machines that
use both pipelining and multiple functional units. I remember the SiByte 1250 and/or the
Raza XLR (both MIPS64, early 2000s) but I assume it was done well before then.
DEC came across another issue with the PDP-11 vs the
VAX. Although the pipelined architecture of the VAX was much faster than the PDP-11, the
actual time for a single instruction cycle was much increased, which led to customers
requiring real-time operation to stick with the PDP-11, as it was much quicker in those
operations. This, along with it's large software back-catalog and established platform
led to the PDP-11 outliving it's successor. Josh Rice
That reminds me of the Motorola 68040. I did the fastpath for an FDDI switch (doing
packet switching in software) on one of those. I discovered that the VAX-like addressing
modes that look so nice on the 68040 takes a bunch of cycles, but there was a "RISC
subset" using just the simplest addressing modes that would produce single cycle
execution. So I limited my code to just those.
The other weirdness was branch prediction. The 68040 had no branch prediction cache,
instead it would statically predict all branches to be taken. Note the difference from
the usual practice, which is to predict backward branches as taken and forward ones as not
taken. No problem either way, but it just meant that the assembly code looked a bit odd
because an if/then/else block would have the unlikely case immediately after the branch
(fall through, not the predicted case) and after that the likely case (branch taken, as
predicted).
It was fun to do 60k packets per second on a 25 MHz processor...
paul