After doing the "fastest" possible implementation, I realized that
Dwight's "smallest" code is far from the smallest. Behold the trivial
solution on the other end of the spectrum:
div10:
ld de,-10 // divisor
xor a // quotient accumulator; clear carry
loop:
inc a
add hl,de
jr c,loop
sbc hl,de ; undo last subtraction; remainder in L
dec a ; we looped one time too many
That is 11 bytes. It is slow as dirt, but if the game is smallest wins,
then this is doing better than your current 16 byte program.