Pete wrote:
You must be thinking of some different 6502 to the
rest of us :-) As
Sellam said, no 6502 opcode takes less than two clock cycles to
execute, and most take more (up to 7): the only 2-cycle instructions are
the ones with implied addressing, like RTS, CLI, TAX, ...
Not RTS, that takes a bunch.
There's no pipelining at all in a 6502. No
overlap of instructions
whatsoever.
There is a little bit of pipelining internally, but it's not really
obvious. The last ALU operation of an instruction is generally done
during the same clock cycle as the fetch of the next instruction.
For instance, when you do an "ADC #35" instruction (add with carry
immediate), it's a two-cycle instruction, but it really takes three
cycles to complete -- the third cycle is overlapped with the following
instruction's fetch. During the first cycle the opcode is fetched,
during the second cycle the immediate operand is fetched, and during the
third cycle, which is the first cycle of the next instruction, the actual
add occurs.
I've spent some time working on a reimplementation of the 6502 in a
Xilinx FPGA. It's actually fairly difficult to design to match the
exact number of clock cycles for each instruction. It's much easier
if you allow instructions to take more cycles, which is the approach
taken by the OpenCores version. That still lets you run it much faster
than the real thing, but is no good for things that depend on the
exact cycles counts, such as the Apple II RWTS routines (low-level
disk access).
Part of that difficulty arises because it is *very* desirable for the
data bus to be latched in a single place in the FPGA (preferrably at
the I/O buffer), and the data then distributed to the other places that
need it. The reason that's desirable is that you don't want the data
setup time to vary depending on how the data is being used, which would
result in two problems: a large data setup time requirement, and the
possibility that setup time violations yield anomalies such as a LDA
instruction setting the accumulator to zero, but NOT setting the zero
flag (e.g., when the bus data is zero when the accumulator is latched,
but non-zero when the flag values are produced).
There aren't two CPU cycles per clock cycle.
Perhaps you're thinking of
the fact that the 6502 uses a two-phase clock, and does part of the CPU
cycle during phi-1, and part during phi-2?
Perhaps the original poster thought that, but it's just the old standard
two-phase NMOS logic. It takes two phases to do just about anything
internally, so it's not a matter of doing two things sequentially in
one clock cycle. (A small number of things occur in parallel in some
cycles, though.)
Eric