Salut again,
Now imagin somthing like a PDP-11/70 or a VAX. The CPU
has more then 18
address bits and memory lives on another bus then the UniBus. The OS
needs the data somewhere in memory. If the bus bridge does no address
translation a UniBus peripheral can reach only the lower 256 kB of RAM.
So if output has to happen the OS needs to copy the data somewhere into
the lower 256 kB RAM. Then it needs to initiate the I/O. If this has
finished (interrupt) it needs to copy possible input from the lower 256
kB to somewhere else in RAM where the OS needs it.
This is ugly. All data needs to be touched twice from the CPU.
Na und? *g*
From a performance point of view, this may be a bit (not too much) ugly.
But from the usability point of view, I don't see any great problems.
Usability means to be able to hook the real Unibus to a PC style machine.
What 22 bit PDP-11s and VAXen do is somthing like
this: The UniBus
bridge has an I/O Memory Management Unit build in. DEC calls it the
UniBus map. The UniBus map has a page table. If a UniBus master
initiates a memory access the UniBus map takes the addess issued by the
bus master. It splits the UniBus address in a lower and upper part to
construct the CPU side address. The lower part is 9 bits (= 512 bytes,
the VAX page size). These lower 9 bits are copied 1:1 from the UniBus
address into the CPU side address. The upper 9 bits of the UniBus
address are used as an index into the IOMMU page table. The page table
entry contains some permission bits (read, write, ...) and the upper 23
bits of the 32 bit CPU address. This way a 18 bit UniBus address is
translated into a 32 bit CPU address.
Ok, it's an MMU for DMA. Would be no real
problem to add such a feature.
But I still don't see the real value - except some theoretical satisfaction.
It would be easy to install a bus mapper between the Unibus and the PCI
master interface. Just some more work.
This is very elegant. If I/O has to be done the OS
programms the UniBus
map with the apropriate translation and initiates the I/O.
No! That's not
elegant. A workaround. A kludge!
Therefore: If you design a PCI-UniBus bridge
_*PLEASE*_ implement a
proper IOMMU / UniBus map. Everything else will be a ugly kludge.
Just copy the VAX UniBus map and you are done. It does exactely what
is needed.
For whom? Who will actually *use* that? The BSD drivers on the PC? If
yes, it would be worth a thought. Otherwise: No way. I'll be glad to
have a stable interface to the Unibus. Bus master access Unibus->PCI is
simply not planned for the first version. I have to define achievable
milestones. And this feature is simply far beyond my needs. I want to
access the Unibus. And I don't have a VAX that should be able to program
my PC. Only 18 bit CPUs and peripheral controllers.
Worst case example:
Assume the most stupid access method on PCI: 8 bit single transaction
read access. That costs 4 cycles per byte. Let's add another cycle for
the sake of a bad implementation on the card. Makes 5.
Then we come to a read transfer rate of approx. 6.3 MByte/s on the bus.
This would allow for roughly 25 complete Unibus memory dumps per second.
Now a slightly more optimistic calculation:
We read the memory with 32 bit, 128 beat bursts (PCI's address
auto-increment feature seems to allow endless bursts).
Then we need one clock for four bytes and three extra clocks (address,
driver pause, transaction spacing). Makes 128+3=131 bus clocks for 512
bytes. Transfer rate is then 123MByte/s. For a complete Unibus memory
readout, it would take about 2ms. Or 492 complete Unibus reads/s.
So you have to add 2 ms to every full 256k portion of data transferred
by an Unibus DMA device. Assume a real fast device which continuously
holds the bus master during operation and doesn't pause, which is
capable of 40MB/s (all this is FAR beyond reality!), it would fill the
memory in 6.25msec. You would have to add 2 msec for that. So the all
over performance would go down to 8.25msec per Unibus fill, that means a
transfer rate of 30MByte/s. Wouldn't be acceptible for me.
Now let's look at a *real* Unibus disk drive, far faster than any RP,
RK, RL, TC, TM: RA81.
According to
http://docs.freebsd.org/44doc/papers/diskperf.pdf, it can
transfer up to 2.2 MByte/s. And has a minimum seek time of 6 msec. As it
has 456 MB on 1,248 tracks, one track has a capacity of roughly 1.5
times the Unibus capacity. If you now split one cylinder into two
portions, you will add at most 1ms extra tranfer time/track.
Simply said: Forget about performance issues!
That would be a DMA bounce buffer in hardware. This is
still ugly and
defeats the reason for DMA.
Yes, I would even do PIO on the Unibus.
As stated
above, I will luckily live without :-)
You will, as it simplifies your hardware
design.
Yes.
But it will be a
nightmare to the OS programmer that has to fight with it writing a
device driver for it.
As you read above, I see no reason to avoid copying data in
the driver.
The driver has to fight something like
driver_read(...,buffer*, size){
...
int res=get_data_from_somewhere(*some_unibus_address);
memcpy(buffer, some_unibus_address+unibus_base, max(res,size));
...
}
Perhaps I have a too much simplified view on the topic. In that case,
I'm grateful for enlightenment!
Best wishes,
Philipp :-)