According to its designers, the ALU on the 8080 is 4
bits wide, and
takes 2 cycles for an 8-bit add or subtract. Presumably it takes at
least 4 cycles for a 16-bit add or subtract. The 8080 takes so many
cycles for *anything* that it's not obvious what it does internally on
any given cycle.
I for one would be interested in seeing whatever references to 8080 internals
this kind of stuff comes from. The bulk of what I have doesn't give any
internal details. I'd alway assumed that there was some sort of
synchronization of writeback to the accumulator (or other destination register)
or the update of the flags that ate up the extra cycle.
Given a 16 bit add takes 11 T cycles, that would be 2 for fetch and decode,
4 for ALU passes, and 5 for who knows what. Probably moving things too and
from internal registers, that would explain why a 16
bit increment could be
done in 6 T cycles, 2 fetch+4 ALU passes.
Eric