On Thu, 13 Jun 2024 at 18:22, Jonathan Stone via cctalk <
cctalk(a)classiccmp.org> wrote:
On Thursday, June 13, 2024 at 03:00:22 PM PDT, Maciej W. Rozycki via
cctalk <cctalk(a)classiccmp.org> wrote:
The architecture designers cheated however even
in the original ISA in
that moves from the MD accumulator did interlock. I guess they figured
people (either doing it by hand or by writing a compiler) wouldn't get
that right anyway. ;)
I always assumed that was because the latency of multiply, let alone
divide, was far too many cycles for anyone to plausibly schedule "useful"
instructions into. Wasn't r4000 divide latency over 60 cycles? Wasn't r4000
divide latency more than 60 cycles?
says that double precision divide is 36 cycles, and double precision square
root is 112 (!).
What's interesting about that is that GCC's model of the R4000 says that
divide is 69 cycles; I'm not sure of the reason for the discrepancy.