Johnny Billquist wrote:
Jerome Fine
replies:
I am replying to Johnny's response, but I had also read the other
replies as well.
Thank you all for your help.
The first point is that using a PEEK/POKE SYSTEM (EMT? - RT-11 has
such a call)
is so high in overhead that it becomes almost useless. In fact, the
key point about the
use of the EMEM.DLL under RT-11 is the efficiency. While it is
possible to access
normal "emulated PDP-11" memory (using E11 on a 750 MHz Pentium III)
in about
0.3 micro-seconds, it takes about 1.2 micro-seconds to reference an
IOPAGE address
in some sort of way - including the PSW or the EMEM.DLL values or
about 4 times
as long. Since this is a huge improvement over using a PEEK/POKE, it
is even worth
giving up 8192 bytes of address space to a dedicated APR (of the
IOPAGE) for that
purpose.
True. From an efficiency point of view, using system calls to
read/write memory is very inefficient.
Jerome Fine replies:
Which means that using a system call is useful only during
initialization. For example,
RT-11 allows the user to .Peek/.Poke the PSW which I agree would be VERY
unreasonable
under RSX-11 and RSTS/E which depend on the KERNEL code maintaining complete
control.
However, under RT-11, the VBGEXE program sets the PREVIOUS DATA space to
user mode for reasons which I do not understand. Fortunately, it is
possible to use the
.Poke system call to set the PREVIOUS DATA space back to KERNEL which is
what the RT-11 setting is prior to using VBGEXE. When the PREVIOUS DATA
space is KERNEL, it is possible to use the 2 instruction example given
below. So
a system call to .Poke during initialization is entirely acceptable.
On the other
hand, with RT-11, it is possible and easy to set the
PREVIOUS DATA
space in the PSW to KERNEL even when VBGEXE is used - more to the point,
it is actually unnecessary since that is the default for a so-called
privileged job (which
all programs are by default). This allows the instruction:
Mov @#BaseReg,R0 ;Get the current value from PC
memory
to be replaced by:
MTPD @#BaseReg ;Get the current value
Mov (SP)+,R0 ; from PC memory
with almost the same time for execution. It also avoids losing that
8192 bytes for APR7
being available just for the IOPAGE registers.
That's not possible with OSes that maintain any kind of protection
between processes, along with virtual memory.
The PSW as such, is not possible to manipulate. If you could, you can
also change your mode to kernel even though it's currently something
else.
Actually, you must be in kernel mode in order to modify the PSW with
any other instructions than SEx and CLx.
See my previous response with respect to RT-11 and my agreement that
under RSX-11 and RSTS/E, the PSW must NEVER be modified by a
user program or a .Poke requested by a user program.
Obviously, a
SYSTEM request avoids all of the problems at a heavy
cost in overhead
estimated at between 50 and 500 times the above two examples.
That was sort of what I was thinking about when I asked if there was
an "fast method
(only a few instructions)" to access an IOPAGE register.
Well, in RSX, you have a rather high overhead to set up the mappping
to the I/O page, unless it's already mapped in when the task starts.
But from there on, there is no overhead at all. It's located somewhere
in your 16-bit address space. (Note that you really don't have to map
the I/O page at APR7 in RSX. You can get it mapped anywhere if you use
the CRAW$/MAP$ or TKB options.)
That is helpful information. There might be times when APR7 should not
be used.
However, with normal privileged programs, the I/O page
is always
present at APR7 even if you don't do anything.
Also VERY helpful. However, if so, then would there be any reason why
APR7 could not be mapped to user memory in the normal manner so that
the user program has a full 65536 bytes of address space, but then the
PREVIOUS DATA space is mapped to KERNEL providing the user
with complete access to the IOPAGE registers via that 2 instruction
example that I gave at the beginning? I can't see that there would be
any greater loss of security since being able to change the IOPAGE
registers either directly or indirectly is just as damaging!
Please comment?
RSX had a bit
more flexibility (opportunity) in this regard. I
believe you can set up a CRAW$ (create address window) directive in
either Macro or Fortran to achieve the desired result.
Yes with reservation. CRAW$ (create address window) is as a part of
doing dynamic remapping of your address space.
However, CRAW$ always required a named memory partition. You cannot
create an address window to an arbitrary memory address.
Also, the memory partitions have protections and ownership associated
with them.
On most systems, CRAW$ cannot get you access to the I/O page, simply
because normally you don't have an address space and a partition
associated with the I/O page.
But if such a partition is created, then CRAW$, in combination with
MAP$
would allow you to access the I/O page.
The same thing can also be achieved even without CRAW$/MAP$, since you
can specify mapping that your task should have already at task build
time, with the COMMON and RESCOM options to TKB.
This seems to be the answer if it is allowed. Obviously it does
require giving up
that 8192 bytes the have APR7 mapped to user space.
Correct.
But unnecessary if privileged jobs already have access to the IOPAGE.
Please comment?
There is also
another option with E11 that I will make use of when I
have finished
with the HD(X).SYS device driver for RT-11. It turns out that if the
memory is
being accessed sequentially, the average time to reference a single
16 bit value
in the file under:
MOUNT HD: FOOBAR.DSK
is actually less than the time to get/store a single value under
EMEM.DLL when as
few as 8 blocks (2048 words at a time) are being referenced.
Consequently, setting
up a small 4096 byte buffer and the associated code to handle to
calls to the HD:
device driver (all standard calls to .ReadF and .WritF in RT-11) is
actually more
efficient since after the values are in the buffer inside the
program, the values can
be referenced and modified at "emulated PDP-11" memory speeds.
You mean that using a device driver, and a device that can access the
"normal" memory instead is better. Well, I'm not surprised. What this
essentially turns into, is that you're emulating DMA.
Actually, under E11, it is almost identical in principle to the VM:
device driver
which accesses "emulated normal PDP-11 extended memory". The E11 command:
MOUNT HD: RAM:/SIZE:number-of-blocks
makes HD: into a Virtual Memory device which directly uses PC memory.
However, the average transfer rate per word for even a few blocks (or a few
thousand words) from/into emulated user memory is a small fraction of a
normal
memory access time.
In addition, if an operating system caches the blocks in a file, the same
speed is achieved.
Of course, the
above solution for sequential references does not work
when the
references are random or when references are at regular but very
large intervals
(thousands and even millions of successive values). For this latter
situation, it
may be possible to modify EMEM.DLL so that a single reference to the
IOPAGE
register modifies all of the specified values (over a range of up to
many billions of
values).
Can't comment much, since I don't know exactly what you're trying to do.
But speedwise, if you really want something to act like fast disk,
writing something that behaves like proper DMA is the best.
You give the device a memory address, a length, and a destination
address on the device, and let it process the data as fast as it can,
without involving the PDP-11 after that point.
It is just a bit more complicated since the memory address can be anywhere
in the 4 MB of emulated PDP-11 memory. So 22 bit address is required -
which can be determined during initialization. The even better aspect
is that
the code (only about a dozen instructions which set up the 6 IOPAGE
registers)
can be in user space which avoids the overhead of a system call.
And if you don't think that amounts to much, my benchmarks show that
the transfer speeds with just 8 blocks (2048 words or 4096 bytes) take
about half the time as a normal system call. Fewer blocks are even more
efficient vs system calls.
Of course, the
result would no longer really be a PDP-11 except for
the controlling
code which would still be 99% of the required code since the EMEM.DLL
changes
are really quite trivial, yet consume 99% of the time to execute. In
case anyone
does not appreciate what I refer to, it is back to my other addiction
- sieving for
prime numbers. I realize that I should probably switch to native
Pentium code,
but is seems more of a challenge and much more fun to run as if a
PDP-11 is being
used with a few GB of memory somewhere out there that can be easily
fiddled with
as if there is a very fast additional CPU similar to those that used
to be available for
special math applications - anyone remember SKYMNK for FFTs?
Hmm, are you just creating a sieve for primes? Ok, then you need large
memory somehow.
Several ways of doing that. For your specific needs, a simple device
in the I/O-page with a command register, an address register and a
data register would probably be just about the best.
For a demonstration program to sieve up to 10**12, I can use normal
PDP-11 memory for the work area of around 30 KB. The 2 arrays
which will be used sequentially will require 78,498 elements each that
are 32 bits or 4 bytes each - a total of less than a MB, but since they
are used sequentially, can be easily read / written in groups of 2048 words
or 8 blocks each.
For those not familiar with sieving for primes, a very large memory is used!
The problem is that sieving requires the storage of large memory used both
sequentially and what seems like randomly. One array is used to store
the primes being used. A second array is used to store the next location
to be used in the work area for that prime. The work area is normally as
large as possible and is accessed at intervals equal to the current prime
being processed.
These days, a sieve program up to a billion (10**9) is considered trivial.
Most individuals who are serious consider any range under a trillion
(10**12)
to be in the nature of a toy. However, since the number of primes up to a
trillion - described as pi(10**12) = 37,607,912,018 - requires more than
16 bits
per element for the second array, just the storage of the second array
of at least
78,498 elements is over 1/4 MB. And since pi(10**9) = 50,847,534 which
is the number of elements in the second array required to sieve up to
pi(10**18) = 24,739,954,287,740,860 for which 30 bits per element is just
sufficient, the second array then requires over 200 megabytes.
Of course, these memory sizes are no longer even very large for a
current Pentium III
system (I have a Pentium III with 768 MB of memory) and with a Pentium
4, they
are only a small aspect of the problem. However, for the PDP-11, they are
obviously impossible. Thus my interest in using E11 and the features
that I have
described.
Note that pi(10**22) is considered to be known and pi (10**23) is likely
known, having been recently found this year. pi(10**24) has still not been
published, but will likely be known in the next year or two when faster
algorithms are found or faster CPUs are used. Sieve programs were not
used to find these values.
Sincerely yours,
Jerome Fine
--
If you attempted to send a reply and the original e-mail
address has been discontinued due a high volume of junk
e-mail, then the semi-permanent e-mail address can be
obtained by replacing the four characters preceding the
'at' with the four digits of the current year.