Well, the 650x is a VERY thrifty architecture. It has no memory-to memory
operations, nor does it have any operations involving more than one register
at a time. Additionally, if one chooses to implement it in the way the
original manufacturers did, the ALU serves, not only to operate the
instruction set, but also is used to operate on the PC and SP as well. This
save LOTS of resources in the construction of the associated counter chains.
That's not to say it's easy to implement this architecture in an efficient
way, though.
You have to look at another aspect of FPGA's however, and that's the
combined effect of routing and resource utilization. The ALTERA folks may
claim to have implemented this architecture in only 7% of the resources of
the part, but at what cost? In general, a substantial portion of the
resources available in a device, in terms, for example, of raw gate count,
is lost in the implmentation of a design. In each logic cell or logic
block, there are resources which the marketing department proudly counts and
advertises, yet which, once a part of the logic cell is used, are gone
forever and unusable. The routing is another factor which plays a big role
in the way FPGA's work out. Allocating a given routing resource in a
certain way can effectively render other logic resources unusable because of
lack of interconnection resources with which to do that. Consequently,
routing in a manner essential to a given level of performance for some of
the device resources can render other resources unreachable for any
practical purpose.
The marketing guys don't consider this when publishing their full-color
glossy brocheures, though. If they go to work, they'll say, well, this
nand gate is only 6% of a CLB, even though the entire CLB is used up, say,
and that pipeline register used to synchronize these functions is only 12% .
. . when in reality as much as 50% of the array may be consumed by such a
design, and the remaining "half" may be very difficult to utilize beyond
15%.
I've taken a good hard look at implementing the 6500 core in XILINX and find
that performance, which is VERY much of interest, is impacted most by ALU
design. Now, the Virtex CLB allows a single CLB to function as a two-bit
full-adder. If one wants the best performance/resource allocation tradeoff,
I'm nearly convinced that the best way might be to design it with a 2-bit
ALU slice because the resource consumption is small yet the delay for a
2-bit registered implementation of an 8-bit ALU would be just as fast as an
8-bit implementation because of the carry delay from stage to stage. It
appears to me that the rate-determining step, then, becomes how fast a clock
can be routed through the array. In the case of the 2-bit slice, it doesn't
have to propagate very far to get the job done. With an 8-bit
implementation, there's a lot more routing delay, and at least four times as
much delay per cycle in order to allow the carry to settle. Since the ALU
is used more than once per machine cycle . . . (see where all this leads?)
Dick
-----Original Message-----
From: Alex Knight <aknight(a)mindspring.com>
To: Discussion re-collecting of classic computers
<classiccmp(a)u.washington.edu>
Date: Friday, August 27, 1999 9:58 AM
Subject: Re: FPGAs and PDP-11's
Hi,
Another data point w.r.t. implementing microprocessors in FPGAs
involves the 6502: When Altera was initially rolling out their 10K
family of FPGAs, one of their marketing charts shows how they
built a 6502 processor inside a 10K50 device using only 7% of
the FPGA resources.
Regards,
Alex Knight
Calculator History & Technology Web Page
http://aknight.home.mindspring.com/calc.htm
At 06:05 PM 8/26/99 -0700, Chuck wrote:
>I did a preliminary "floor plan" for the PDP-8 and it used just under 1/3
>of the 4010 (or 75% of a 4005 given the routing issues, which leaves
enough
>to do an M8660 serial port.)
>
>--Chuck
>
>