It was thus said that the Great dwight elvey once stated:
Date: Sat, 26 Jan 2008 17:21:56 -0500
From: spc at
conman.org
To: cctalk at
classiccmp.org
Subject: Re: Z80 Divide by 10
It was thus said that the Great dwight elvey once stated:
230 clock cycles and no conditionals
Dwight
I came across this bit of code from
http://www.hackersdelight.org/ to
divide by 10:
unsigned int div10(unsigned int n)
{
unsigned int q;
unsigned int r;
q = (n>> 1) + (n>> 2);
q = q + (q>> 4);
q = q + (q>> 8);
q = q + (q>> 16);
q = q>> 3;
r = n - ((q << 3) + (q << 1));
return (q + ((r + 6)>> 4);
}
On the Z80, the (q>> 8) is trivial to handle, and you can probably skip
the (q>> 16) statement all-to-gether (since you're only handling 16 bits
anyway, this reduces to 0 so the statement there can be skipped). My Z80
skills are pretty weak (and I don't have any references at hand right now)
but this looks pretty straight forward to translate.
Hi
I coded this up in Forth and it does good at the divide.
The values q and r are not much good, meaning one still
has to calculate the remainder, similar to the last part of my
code.
The operation>> is really tough on a Z80 for 16 bit operations.
One needs to do thing through the a register and use rra instructions
or use the rr r instruction with a clearing of the carry before each
single bit shift. Not having a parameterized shift like the 386 has
makes it difficult.
I doubt one could beat the 230 cycles I had since each shift cost
12 cycles. This is 34 to 38 shifts depending on optimization or 408
cycles minimum without the 7 16 bit adds. This is already more
than the 230 cycles.
Where do you get 34 shifts from? I can see that being reduced down to
about 12 shifts:
/* q = (n >> 1) + (n >> 2) */
n >>= 1;
q = n;
n >>= 1;
q += n;
/* q = q + (q >> 4) */
q' = q;
q' >>= 1;
q' >>= 1;
q' >>= 1;
q' >>= 1;
q += q';
/* q = q + (q >> 8) */
q'(lo) = q(hi);
q(lo) += q'(lo);
q(hi) += CarryFlag;
/* q = q + (q >> 16) */
/* skipped because we're using 16 bits */
/* q = q >> 3 */
/* skipped because not needed */
/* r = n - ((q << 3) + (q << 1)) */
q' = q;
q' >>= 1;
q' >>= 1;
q += q'
r = n - q;
/* return q + ((r + 6) >> 4 */
r += 6;
r >>= 1;
r >>= 1;
r >>= 1;
r >>= 1;
q += r;
That's now 144 cycles for shifts, with 6 16 bit adds and two 8 bit adds.
Now that I have access to my Z80 references (but no real Z80 to test this
on), I think the following code will work:
;-------------------------------------
; DIV10 Perform division by 10
; Entry: BC - 16 bit value
; Exit: HL - quotient
; everything else trashed.
; If you need a remainder,
; then you can do
; REM = HL - ((HL << 3) + (HL << 1))
;------------------------------------
savehl defw 0
const6 defw 6
; bytes cycles
div10 ld d,b ; save n ; 1 4
ld e,c ; 1 4
; q = (n >> 1) + (n >> 2)
srl b ; 2 8
rr c ; 2 8
ld h,b ; 1 4
ld l,c ; 1 4
slr b ; 2 8
rr c ; 2 8
add hl,bc ; 1 11
; q = q + (q >> 4)
ld b,h ; 1 4
ld c,l ; 1 4
ld (savehl),hl ; 3 16
ld hl,savehl ; 3 10
and a ; 1 4
rrd ; 2 18
ld hl,(savehl) ; 3 16
add hl,bc ; 1 11
; q = q + (q >> 8)
ld b,0 ; 2 7
ld c,h ; 1 4
add hl,bc ; 1 11
; q = q + (q >> 16) skipped
; q = q >> 3 skipped
; r = n - ((q << 3) + (q << 1))
ld b,h ; 1 4
ld c,l ; 1 4
srl b ; 2 8
rr c ; 2 8
srl b ; 2 8
rr c ; 2 8
add hl,bc ; HL = q ; 1 11
ld b,h ; BC also = q ; 1 4
ld c,l ; 1 4
ex de,hl ; HL = n, DE = q; 1 4
or a ; CF = 0 ; 1 4
sbc hl,de ; n - q ; 2 15
; return q + ((r + 6) >> 4
ld de,(const6) ; 4 20
add hl,de ; 1 11
ld (savehl),hl ; 3 16
ld hl,savehl ; 3 10
and a ; 1 4
rdd ; 1 18
ld hl,(savehl) ; 3 16
add hl,bc ; 1 11
ret ; 1 10
;67 362
Well ... crap. 362 cycles. The Z80 really *sucks* on rotates, doesn't
it? I thought the RDD instruction would help, but really, it just breaks
even on the cycle count over four repetitions of SLR/RR (at least, the way
I've coded it because of the bottleneck in the use of HL).
-spc (Fun exercise though ... )