On Jun 6, 2018, at 6:30 PM, Jim Manley via cctalk
<cctalk at classiccmp.org> wrote:
...
Seymour Cray was a genius because he observed that the fastest possible
circuit is a wire, and that if you use the same length of wire for each bit
in a word in cables between stages in a computer that you want to go as
fast as possible, all of the bits will arrive at the next stage at the same
time. That was important for computers that took up entire rooms, with
circuit boards and racks far enough apart that upwards of a dozen
nanoseconds could elapse as bits passed from stage to stage.
Yes. "Same length wire" is how I first heard it. When I started reading the
6600 wire lists I discovered that the reality is far messier. The PPUs aren't too
bad, that is a 4 phase clock, where consecutive stages are clocked usually at 50 or 75 ns
apart. For example, the consecutive stages of the barrel are clocked (mostly) at 75 ns
difference.
The CPU is much worse. It sometimes has 4 clock phases, but some parts look more like 6
or 7. If you study the block diagrams you'll see clock signals annotated by their
offset from the zero reference, in increments of 5 ns. Not quite all of the 20 possible
offsets are used, but more than half are. And it matters, as I found out the hard way
while trying to get a VHDL model to work. The model includes the nominal gate delay (5
ns) and the rounded wire delay in 5 ns multiples for "long enough" wires. That
mostly works. Replacing the clock tree by explicitly coded 5 ns multiple clock signals
makes it better. Sometimes the documented value isn't quite right and you need to
move things by 5 ns. Sometimes the same document gives two different timings for the same
clock signal on different pages.
Oh yes, then there are amazingly nutso things like flip-flops (which in the 6600 are R/S
flops) where the R and S inputs are asserted at the same time, with pulses that both rise
and fall roughly at the same time. I can fudge that. But it makes me wonder why the 6600
ever worked in the wild.
For example, in the instruction issue and instruction stack logic I can get an "eq
*-2" to work the first time and the second time but not the third: "not in
stack" works, "in stack" works the first time because the "inch"
is still underway, but by the second time the inch has stopped and my timing is off by 5
ns and things go utterly haywire. By tweaking nanoseconds I can move the problem but I
have not yet made it go away.
One place that does offer a fair amount of sanity is cross-cabinet connections; the coax
cables are all the same and the nominal delay including driver and receiver is 25 ns,
which nicely matches the semi-standard 4 phase clock. So tracing signals around a pile of
cabinets works pretty well. Some of the data paths are quite amazing, because the memory
fan-in and fan-out is routed through several other cabinets and the control signals are
routed separately yet again. Tracking the signals for an exchange operation, for example,
is quite an impressive dance of actions being launched fairly long in advance so they get
to the memory latches at just the right time.
paul