Paul Koning wrote:
...
cycle N-1. This could produce weird looking code like
this:
clr foo ; bne bar
because the bne would react to the result of the previous line ALU
operation. It takes an unusually strange mind to cope with an
architecture like this. (No wonder Richie Lary liked it...)
heh. try the TI TMS320C6000 DSP (the 642 in particular). It has 6
ALU's and can execute 8 instructions at one time. The loads and stores
are delayed 4 cycles. If you access a register too soon you get the old
data. oops. no pipe stalls for you, thank you very much.
after using the 642 the MIT lisp machine microcode was a snap :-) it
only delays alu results and data fetches 1 cycle. it's often hard to
find enough work to fill 4 delay slots without nop's.
(don't get me wrong, I love bit slice alu's and the 2901 in particular.
I used to sleep with a copy of "Mick & Brick", one of my favorite books.)
-brad