On Jun 11, 2024, at 11:52 PM, ben via cctalk
<cctalk(a)classiccmp.org> wrote:
On 2024-06-10 10:18 a.m., Joshua Rice via cctalk wrote:
On 10/06/2024 05:54, dwight via cctalk wrote:
No one is mentioning multiple processors on a
single die and cache that is bigger than most systems of that times complete RAM.
Clock speed was dealt with clever register reassignment, pipelining and prediction.
Dwight
Pipelining has always been a double edged sword. Splitting the instruction
cycle into smaller, faster chunks that can run simultaneously is a great idea, but if the
actual instruction execution speed gets longer, failed branch predictions and subsequent
pipeline flushes can truly bog down the real-life IPS. This is ultimately what led the
NetBurst architecture to be the dead-end it became.
The other gotya with pipelining, is you have to have equal size chunks.
A 16 word register file seems to be right size for a 16 bit alu.
64 words for words for 32 bit alu. 256 words for 64 bit alu,
as a guess.
Huh? There is no direct connection between word length, register count, and pipeline
length.
The natural pipeline length (for a given functional unit) is the number of steps needed to
do the work, given a step that can be completed in a single clock cycle. That assumes a
pipe that long is affordable; if not it gets shorter. Not all functional units will have
the same pipeline length.
The register count is a function of cost -- for the registers themselves and for the
scoreboard logic to sort out register conflicts. In modern designs that would be die
area; in older machines it would be cost in modules or transistors. For example, in the
CDC 6600, the registers (8 x 60 bits, 8 x 18 bit address, 8 * 18 bit index/count) and
their associated data path controls to/from all the functional units take up an entire
chassis, 750-ish logic modules.
You never see a gate level delays on a spec sheet.
Our pipeline is X delays + N delays for a latch.
Gate level delays are not interesting for the machine user to know. What is interesting
is the detailed properties of the pipelines, including whether they can accept a new
operation every cycle or just every N cycles (say, a multiplier that accepts operands
every 2 cycles); how many cycles is the delay from input to output; and whether there are
"bypass" data paths to reduce the delays from input or output conflicts. Often
these details are hard to pry out of the manufacturer; often they are not documented in
the standard data sheets or processor user manuals. But they are critical if you want to
do work such as pipeline models to drive compiler optimizers.
paul