On Jun 15, 2024, at 1:41 PM, Chuck Guzis via cctalk
<cctalk(a)classiccmp.org> wrote:
I'm certain that Paul has done his share of this, but an art on the CDC
6600 was hand-scheduling instruction execution. There was at least one
class for this--and probably more. The CPU could issue one instruction
every cycle, assuming that there were no conflicts. The 6600 had
several functional units whose operation could overlap.
I learned it from OS code reading and adopted some of it for my own work, but not much
because I actually only worked on the 6500 -- which doesn't have multiple functional
units.
Writing good code for those machines was further complicated by the fact that instructions
were either 1/4 or 1/2 word long, could not split across word boundaries, and branches
would only go to the start of the word. So there tended to be NOPs to pad out the word,
which the assembler would supply. Avoiding them would make the code go faster and of
course make it smaller.
The other complication was a fairly limited set of registers, and the fact that loads
would go only to X1..X5 while stores could only come from X6 or X7. So a memcpy would
involve a register to register transfer. That takes 3 cycles on a 6600, so a skillful
memcpy implementation would use two load registers, both store registers, and two separate
functional units for the R-R move (one via the "boolean" unit and one via the
"shift" unit). I remember my bafflement the first time I saw a shift (by zero)
used to do just a register to register move; on a 6500 you wouldn't have any reason to
write that.
I once crashed the PLATO system in mid-day, when the load hit peak (600 users logged on)
because I had slowed down a critical terminal output processing step and the machinery
didn't have flow control there. My bosses were NOT happy. I solved the issue by
cleaning up that block of code to avoid all NOPs; the result was that it was both shorter
and faster than the previous version while still delivering the new feature. :-)
paul