You're quite right, but I actually meant that there aren't any instructions
which operate on more than one register at a time using the ALU with more
than one register for inputs. If you consider the instructions which do use
the ALU, you can see that a single register set, implemented as a RAM block
would allow you to transfer from the register RAM outputs through the ALU
and back into the registers in a single operation. That's what makes this
architecture so thrifty, as it means that you can send the PCL through the
ALU, adding a zero with carry set, and back to PCL, setting a carry flag if
that's applicable and if carry's true, then adding zero with carry to PCH
again storing the result in the source register.
In reality there are several operations which use the register set as both
source and destination, but none which use TWO registers as operands and
then use the registers as a destination as well. What that allows is that
you use a ram location as PCH, one as PCL, one as SP, and one as each of the
registers, X, Y, and A. Because of the way the thing works, the logic paths
are simple and straightforward to steer via a single data bus from the ALU
back to the register inputs. That explains why there's an extra cycle
needed whenever addressing across a page-boundary occurred.
If you constrain your thinking to the logic components which were available
back in the mid '70's, e.g. 74181, 74189 (for the register set), and
consider what was on the data bus when a "float" was encountered during a
read, namely the PCH, you begin to see the rudiments of this processor's
internal architecture. Moreover, if you think of the "pipleining" used by
the 650x in terms, not of synchronous pipleining as commonly used today, but
of pipelining the control structure so that the data flow could be managed
not with edge-triggered flip-flops but with gated latches, ala-7475, then
you see how the timing was developed.
The ALU was always a path for data from the registers to the registers'
input bus. The data bus output latch was, of course taking inputs from this
as well, and the output data, coincidentally followed the rising edge of the
phase-2 clock by about the same amount of time as the valid addresses
followed the falling edge. Since register-to-register operations had to
flow through the ALU, and since the registers had a common input path, only
one register could be targeted at a time. Since the register set is a RAM,
you couldn't do it any other way. If separate registers had been used, the
number of multiplexers would have been made the chip much larger.
The operations on the accumulator which required either immediate data or
data from memory were served by an impending operand register which was
loaded from the last memory fetch prior to the execution of the operation.
This action took a cycle, but didn't involve the data bus, so that what when
the processor fetched the next opcode, knowing that the impending operand
register was not involved in that operation and knowing that the one
register which would be unaffected by an opcode fetch was the IOR.
Dick
-----Original Message-----
From: Hans Franke <Hans.Franke(a)mch20.sbs.de>
To: Discussion re-collecting of classic computers
<classiccmp(a)u.washington.edu>
Date: Friday, August 27, 1999 12:45 PM
Subject: Re: FPGAs and PDP-11's
> Well, the 650x is a VERY thrifty architecture. It
has no memory-to
memory
> operations, nor does it have any operations
involving more than one
register
at a time.
TXA ? (Don't kill me :)
[...using 'only' one ALU...]
Not uncommon back than and very efficient. I still belive the 65xx
is one of the best - the instruction set is well defined to get
the maximum out of a minimal hardware. You can see the function
blocks klick just by looking at the instructions.
[... about resources]
Exact, thats the main Problem with most %used numbers.
> I've taken a good hard look at implementing the 6500 core in XILINX and
find
> that performance, which is VERY much of interest,
is impacted most by ALU
> design. Now, the Virtex CLB allows a single CLB to function as a two-bit
> full-adder. If one wants the best performance/resource allocation
tradeoff,
> I'm nearly convinced that the best way might
be to design it with a 2-bit
> ALU slice because the resource consumption is small yet the delay for a
> 2-bit registered implementation of an 8-bit ALU would be just as fast as
an
> 8-bit implementation because of the carry delay
from stage to stage. It
> appears to me that the rate-determining step, then, becomes how fast a
clock
> can be routed through the array. In the case of
the 2-bit slice, it
doesn't
have to
propagate very far to get the job done.
Well, after all, any serious attempt to bring a 6502 into a FPGA
will be about speed - and saving resources might not be the
primary goal.
> With an 8-bit
> implementation, there's a lot more routing delay, and at least four times
as
> much delay per cycle in order to allow the carry
to settle. Since the
ALU
> is used more than once per machine cycle . . .
(see where all this
leads?)
More than once ?
Maybe I'm just blind, but I cant see more than one ALU op per cycle.
Gruss
H.
--
Stimm gegen SPAM:
http://www.politik-digital.de/spam/de/
Vote against SPAM:
http://www.politik-digital.de/spam/en/
Votez contre le SPAM:
http://www.politik-digital.de/spam/fr/
Ich denke, also bin ich, also gut
HRK