On 2024-06-10 10:18 a.m., Joshua Rice via cctalk wrote:
  On 10/06/2024 05:54, dwight via cctalk wrote:
  No one is mentioning multiple processors on a
single die and cache
 that is bigger than most systems of that times complete RAM.
 Clock speed was dealt with clever register reassignment, pipelining
 and prediction.
 Dwight 
 Pipelining has always been a double edged sword. Splitting the
 instruction cycle into smaller, faster chunks that can run
 simultaneously is a great idea, but if the actual instruction execution
 speed gets longer, failed branch predictions and subsequent pipeline
 flushes can truly bog down the real-life IPS. This is ultimately what
 led the NetBurst architecture to be the dead-end it became.
  
The other gotya with pipelining, is you have to have equal size chunks.
A 16 word register file seems to be right size for a 16 bit alu.
64 words for words for 32 bit alu. 256 words for 64 bit alu,
as a guess.
  You never see a gate level delays on a spec sheet.
  Our pipeline is X delays + N delays for a latch.
  How Fast Can Computers Add?
  Scientific American
  Vol. 219, No. 4 (October 1968), pp. 93-101 (9 pages)
  I do not think that will change vs MORE's law, LESS's law,
  BIG MONEY's law.
  DEC came across another issue with the PDP-11 vs the
VAX. Although the
 pipelined architecture of the VAX was much faster than the PDP-11, the
 actual time for a single instruction cycle was much increased, which led
 to customers requiring real-time operation to stick with the PDP-11, as  
Forget that, noise. PDP 11's dirt cheap compared to VAX.
  it was much quicker in those operations. This, along
with it's large
 software back-catalog and established platform led to the PDP-11
 outliving it's successor. Josh Rice 
Now that makes more sense.