"Sean 'Captain Napalm' Conner" <spc(a)armigeron.com> posted some
analysis of I/O transfer loops on a Color Computer (6809):
Well, the smallest loop I can see is:
POLL LDA <$STATPORT ; 4 [1]
ANDA #TESTBIT ; 2
Bcc POLL ; 3
The poll loop is 9 cycles long, and the dataread portion (including
Or perhaps:
POLL: BITB <$STATPORT ; 4
Bcc POLL ; 3
For a poll loop of 7 cycles, assuming that B is preloaded with the
appropriate mask.
Still no where close to good enough. The poll loop has to go.
If the hardware is set up such that the DRQ bit is
tied to an interrupt
(and for the loop, it's the only active source of interrupts), you can get a
miminum of 17 cycles:
POLL SYNC ; 2+
LDA <$DATAPORT ; 4
STA ,X+ ; 6
DECB ; 2
BNE POLL ; 3
The only way I can see in speeding this up is to tie the reading of the
Or you can partially unroll it and use D:
SYNC
LDA <$DATAPORT
POLL: SYNC ; 2+
LDB <$DATAPORT ; 4
STD ,X++ ; 8
SYNC ; 2+
LDA <$DATAPORT ; 4
DECB ; 2
BNE POLL ; 3
SYNC
LDB <$DATAPORT
STD ,X+
which gets you down to minimum times of 14 and 11 cycles on alternate
bytes. B needs to be preloaded with (sector_size/2)-1. Unfortunately
14 cycles is too close to the limit.
For a write operation, you might be able to do slightly better by using
"PULU A,B" in place of "LDD ,X++", saving one cycle.
Now, given the same hardware (tieing DRQ to the
DATAPORT to cause the CPU
to wait until ready) plus a signal that indicates the end of the transfer
tied to the NMI, you can save an additional two cycles:
POLL LDA <$DATAPORT ; 4
STA ,X+ ; 6
BRA POLL ; 3
For 13 cycles, or on a Coco, 14.6uSecs (minimum). At the cost of some
Might be good enough. It would be nice to have more margin for speed
variation, and I don't think that the Coco can generate an NMI on
completion, although in this case an IRQ should do. Of course, the
hardware also would have to be designed to release the wait when the
interrupt occurs.
Partially unrolling this one but NOT using the completion interrupt
yields:
LDA <$DATAPORT
POLL: LDB <$DATAPORT ; 4
STD ,X++ ; 8
LDA <$DATAPORT ; 4
DECB ; 2
BNE POLL ; 3
SYNC
LDB <$DATAPORT
STD ,X+
For a minimum of 12 and 9 cycles on alternate bytes. I think this one allows
sufficient margin that it should work even if the disk is running 15% fast.
Based on what Tony said, it sounds like the Coco's FDC hardware can probably
support it.
Like I said several posts back, transferring a byte every 16 microseconds
on a sub 1 MHz processor is tricky.