On Mon, 25 Oct 1999, Carlos Murillo wrote:
At 07:56 PM 10/24/99 -0700, you wrote:
>Excuse me? Could you please back up this assertion with data? After all,
>at -some- point, all these busses have to get their data into/out of the CPU,
>right? And -that- is a "bottleneck" for sure... (Sure, you can have
>channel-to-channel I/O, but most aps are not just shuffling bits.)
Well ... I have some experience with high-speed
switches and crossbars
in parallel supercomputers (as a user). The fallacy in your thinking is
that you believe that moving data around is not "processing".
It's kinda like "simulfax shuffle time" (if you're a Firesign Theatre
fan...).
You still
think that the real processing takes place only at the cpu. Matter of fact
is that, in the real world, as data goes through each driver/buffer and
process in the OS on its way to the process that will actually do something
with it (i.e., actual "integer-op-related" cpu time) there are usually several
large block transfers. If all of this can happen without hogging the cpu
(and you need hardware to do it) you can bet that the corresponding machine
will be many times faster than a machine with a PCI bus.
Sure.
But wouldn't it be better to just put the data in the right memory locations
in the first place?
I once read that the average number of moves for net
data (after it is in
memory) for data from input through tcp/ip stack through OS through
application is on the order of 4.x ... I think in some Sun literature...
There are some papers about "Zero-Copy" TCP/IP implementations. Scatter/Gather
DMA is helpful. People are aware of this problem.
I claim that excessive buffer-to-buffer copies are usually an indication of
poorly designed s/w.
(The only way I can tie this with old iron is that on really slow machines
like the 1620 or 1130 or 1401, you'd have to be very careful to not copy
and copy and copy those buffers; you'd spend a lot of time figuring out how
not to copy 'em, just because big buffer copying was so expensive. Heck,
on the 1401, we'd manipulate data directly in locations 201-332, which was
where the 132-column 1403 print buffer was, just so that we could max out
that 600-LPM beauty of a printer!)
-mac