I have to say this about the LCM VAX: it successfully ran CMUIP for a
rather long time. The pattern of failures, becoming more and more
frequent, seems to conform to a hardware issue rather than one of
software. The machine has never had a large load - when I watched it
regularly, there were rarely more than a handful of users at any one time.
The machine has been power-cycled in response to errors, which would of
course reinitialize all transient data structures - and I do not believe
that CMUIP uses persistent caches (i.e. cached to disk).
Yes, LCM could just load UCX and perhaps whistle a happy tune. It might be
an interesting experiment to do so and observe behavior - the changes in
the startup script could easily be commented in/out. I'd certainly like to
see them continue to use CMUIP for historical reasons. Multinet would also
be interesting, if policies have changed at LCM to provide for licensing
costs of that software. When I was restoring the machine back in 2008-9,
Process Software offered a somewhat amusing 'discount' for an educational
institution. -- Ian
On Wed, Nov 19, 2014 at 8:03 AM, Peter Coghlan <cctech at beyondthepale.ie>
wrote:
One thing, though.
I don't think that the error code from the $QIO in the OPCOM log is a
VMS exit code. But I might be wrong on that.
But that could do with some more examining.
There is a poorly phrased entry in the CMU/IP FAQ which could give the
impression that CMU/IP uses it's own error codes that are entirely
different
from VMS status codes. What I think it is really trying to say is that
like
many VMS applications, CMU/IP defines _additional_ status codes that VMS
does
not already have suitable messages defined for and the text messages
associated
with these are not available unless the appropriate CMU/IP provided message
files are loaded.
Low numbered error codes such as 1C (and another favourite - 0C which is
%SYSTEM-F-ACCVIO, access violation) come from system services and runtime
library functions that are part of VMS and the message texts are made
available
automatically by VMS. It is not the case that CMU/IP reporting an error
code
of 1C means something different to some part of VMS reporting it. They
both
mean process quota exceeded.
Directly underneath that entry in the FAQ, I found the following:
3.1.2 >>>> IPACP CRASH DUE TO QUOTA
EXCEEDED
[20-MAR-1995]
For systems with a high IP load, IPACP may occasionally crash with a quota
exceeded. This does not refer to disk quota, but to one of the process
quota
limits. Usually, the quota in question is BYTLM.
The default BYTLM provided for IPACP (65536) is sufficient for only about
20
connections. IPACP takes about 32000 for itself and each connection takes
about
1872 bytes. This requirement is NOT currently
documented.
To increase the BYTLM for the IPACP, modify the
IP_STARTUP.COM procedure
and
change the value of the /BUFFER_LIMIT qualifier on the RUN command that
starts
the IPACP process. Then shut down and restart IPACP.
At the current time, there also appears to be a memory leak in IPACP
which has
the effect of gradually reducing the available BYTLM over time. When this
gets
close to zero, IPACP will hang (as it retries) and then crash soon
afterwards.
It is therefore desirable to give IPACP more BYTLM than the typical load
might
suggest. If this sort of crash is experienced, increase the BYTLM by 50%
and
restart it.
<A.Harper at kcl.ac.uk
Looks like my pagefile quota guess was wrong and the culprit is BYTLM.
However,
I suspect the underlying cause of this problem has never been fully
addressed
and increasing the quota will not help, or worse, will help for about a
week
before the problem returns even more frequently.
I cannot overemphasise how much relief will be experienced on the
replacement
of CMU/IP by something that works properly or even by something that
doesn't
mess up as badly. Problems that you didn't even know you had will go away,
even ones which seemed unrelated to networking. On sunny days, the sun
will
seem brighter and the sky bluer :-)
In my previous posting, I forgot to mention that you can also try:
$ MCR NCP SHOW KNOWN LINE COUNTERS
if running DECnet. This will give DECnet's view on any network media
problems
including those relating to other protocols going through the same network
adapter. It probably won't have much to say about hardware failures in the
network adapter though. Remember that on a half duplex ethernet,
collisions
are normal and expected but late collisions indicate a problem.
Regards,
Peter Coghlan.
--
Ian S. King, MSIS, MSCS
Ph.D. Candidate
The Information School
University of Washington
An optimist sees a glass half full. A pessimist sees it half empty. An
engineer sees it twice as large as it needs to be.