On 01/09/2023 19:39, Paul Koning via cctalk wrote:
There's actually a pull in two opposite
directions. One is to put more stuff within a chip (System On Chip approach) and make the
interconnects inside very wide, perhaps an entire L1 cache line wide. The
Raza/NetLogic/Broadcom XLR and its successors are a good example, very nice MIPS-64 SOCs.
The other is to do off-chip interconnects serially at very high clock rates.
Indeed. Internal interconnects are quite easy to keep a consistent
propagation delay. In a similar fashion, (i believe, correct me if i'm
wrong) most memory buses are also parallel. It's easy to keep the
propagation delay consistent when it's essentially baked into the
product. Things like expansion buses and "external" buses like USB can't
guarantee that the propagation delay is consistent, and as such, can't
operate at a significant speed.
Of course there are cases where serial isn't fast
enough. The fastest Ethernets are an example, with their multi-lane transceiver buses.
Another is the JESD204 standard, used in signal processing to connect A/D and D/A
converters, where you might be looking at multiple analog data streams, 14-16 bits wide,
multiple Gsamples/second. That might takes 2-8 serial links working together. For those,
there isn't a requirement for alignment of the bits across the wires, instead the data
streams are reconstructed serially for each lane and then aligned properly to form the
words. So within reason the lanes may have different propagation delay and still work.
paul
This is essentially how PCIe works. It's easier to take multiple high
speed serial streams, and reconstruct the data afterwards, than it is to
operate those lanes in a synchronous way. Logic is simpler, and
bandwidths are higher.
Generally, the serial vs parallel problem has been solved by using
multiple serial streams in parallel. True parallel buses are really only
useful when latency is a bigger issue than throughput, as whilst serial
buses can be fast, it still takes a significant amount of time to
reconstruct the data into words, even if you can do a lot very quickly.
For most uses, that kind of low latency performance just isn't needed.
Josh