Every time I see code like this for processors without
a native DIV, I
wonder if the same code ported to x86 would indeed outperform the native
DIV. Would it? I know that on a 286 or higher, where MUL and DIV were
greatly optimized to about 12 cycles, no; but what about on the original
808x, where MUL/DIV could take as much as 144 cycles?
In 286 and 386 you could usually beat IDIV when dividing by small
constants especially if you knew your dividend was of less than the
full range of the possible register values.
On a 386 you could play tricks with LEA that could shave off cycles.
I seem to barely remember a technique for dividing by 10 in which the
first two instructions were of the form LEA EAX,[EAX+EAX*4] to
multiply the dividend by 25 (2 cycles each)... I can't remember the
tricks that follow that. Overall it was probably "multiply by
approximately 25.6 then right shift by 8 bits". I can't for the life
of me remember in what bit of code I saw it. I could probably write
it from scratch given what I've remembered so far.
Eric