On Thu, 13 Jun 2024, Jonathan Stone wrote:
The
architecture designers cheated however even in the original ISA in
that moves from the MD accumulator did interlock. I guess they figured
people (either doing it by hand or by writing a compiler) wouldn't get
that right anyway. ;)
I always assumed that was because the latency of multiply, let alone
divide, was far too many cycles for anyone to plausibly schedule
"useful" instructions into. Wasn't r4000 divide latency over 60 cycles?
Well, overflow and divide by zero checks will often take many cycles in
parallel with MDU executing the operation, but you're of course correct in
that the designers have made a reasonable decision there. I just put it
differently. The net result however is the architecture has never been
fully without pipeline interlocks, although indeed it used to be close.
Performance figures for the R3000 would be more appropriate for the MIPS
I initial ISA revision and reportedly said CPU executed a 32-bit division
in 35 cycles. I can imagine the R4000 could need over 60 cycles to run a
64-bit division. Figures vary among more modern implementations, but MDU
operations continue having significant latencies.
Maciej