I'm certain that Paul has done his share of this, but an art on the CDC
6600 was hand-scheduling instruction execution. There was at least one
class for this--and probably more. The CPU could issue one instruction
every cycle, assuming that there were no conflicts. The 6600 had
several functional units whose operation could overlap.
But we've discussed this before...
On the large vector STAR-100, operands were fetched via a 512-bit wide
(not counting error checking bits) memory bus and pipelined vector
units. The trick there was not so much scheduling of scalar
instructions, but avoiding "bubbles" in the vector pipes.
--Chuck