DELUA technical manual, VAX diagnostic

List overview All Threads
Download

newer

older

PDP-11/23 Debugging

WG: Re: VAXen available, pickup...

RichA＠LivingComputerMuseum.org

17 Nov 2014 17 Nov '14

2:05 p.m.

We have a crumbling DELUA in the VAX-11/780-5 at LCM. There is a copy of the *User's* Manual at Bitsavers, but not the *Technical* Manual. Does anyone happen to have a copy lining the hamster cage floor, vel sim.? The User's Manual refers to a diagnostic, EVDYB, which "runs under the VAX diagnostic supervisor (VDS)." Does anyone here have either of those things, or pointers to them? In the mean time, I'm going to try to get the PDP-11/70 to run CZUAD from a simulated TU58, but I'm told that getting the DELUA into and out of the crowded Unibus backplane is an exercise in bloodletting. Thanks, Rich Rich Alderson Sr. Systems Engineer Living Computer Museum 2245 1st Ave S Seattle, WA 98134 http://www.LivingComputerMuseum.org/

Show replies by date

henk.gooijen＠hotmail.com

17 Nov 17 Nov

2:50 p.m.

-----Oorspronkelijk bericht----- From: Rich Alderson Sent: Monday, November 17, 2014 8:05 PM To: cctech at classiccmp.org Subject: DELUA technical manual, VAX diagnostic We have a crumbling DELUA in the VAX-11/780-5 at LCM. There is a copy of the *User's* Manual at Bitsavers, but not the *Technical* Manual. Does anyone happen to have a copy lining the hamster cage floor, vel sim.? The User's Manual refers to a diagnostic, EVDYB, which "runs under the VAX diagnostic supervisor (VDS)." Does anyone here have either of those things, or pointers to them? In the mean time, I'm going to try to get the PDP-11/70 to run CZUAD from a simulated TU58, but I'm told that getting the DELUA into and out of the crowded Unibus backplane is an exercise in bloodletting. Thanks, Rich Rich Alderson Sr. Systems Engineer Living Computer Museum 2245 1st Ave S Seattle, WA 98134 http://www.LivingComputerMuseum.org/ --- I have an RA82 with "VAX diagnostics" on it, but I will need several months before I will be able to access that drive (problems with the VAX-11/750 console line). I only have the User Manual (original version not a copy). Insertion and removal of the DELUA is not more difficult than an RL11. Maybe the DELUA is even easier to insert and remove than an RL11, because the RL11 has a 40-pin BERG header at the *top* side of the module, whereas the DELUA has a smaller (16-pin?) header at the *front*, so the connection cable to the bulkhead is easier to connect on a DELUA. Both boards are "hex", so a bit of "wiggling" helps to get all 6 edge connectors aligned for the backplane slot. - Henk, PA8PDP

matt＠9track.net

3:59 p.m.

On 17/11/2014 19:05, Rich Alderson wrote:

...

You can get the VAX-11 diagnostics tape here: ftp://ftp.trailing-edge.com/pub/vax_dists/vax11780diagnostics_vaxsim.tap.bz2 Copy the tape to your system disk (example with a TS11 drive): $ BACKUP MS0:/SAV SYS$SYSDEVICE:[*...] /LOG The diagnostics live in [SYS0.SYSMAINT]. If you boot from your system disk with R5 flags set to '10' it will boot to the diagnostic supervisor. You need to attach several devices before you can see the filesystem e.g.: DS> ATTACH KA785 SBI KA0 DS> ATTACH DW780 SBI DW0 DS> ATTACH UDA50 DW0 DUA DS> ATTACH RA82 DUA DUA0 DS> DIR DUA0:[SYS0.SYSMAINT] DS> RUN DUA0:[SYS0.SYSMAINT]EVDYB The help is in [SYS0.SYSMAINT]EVDYB.HLP I think you can also copy a subset of the diagnostics to a console floppy and boot from that. You won't have to attach the disk controller in that case. Matt

bqt＠update.uu.se

18 Nov 18 Nov

1:39 a.m.

On 2014-11-17 11:05, Rich Alderson wrote:

...

Don't have the manual. I *might* have the diagnostics. I need to check when I'm back in Seattle in a couple of days. What is "crumbling" about the DELUA? Johnny

RichA＠LivingComputerMuseum.org

3:21 p.m.

From: Johnny Billquist Sent: Monday, November 17, 2014 10:40 PM

...

On 2014-11-17 11:05, Rich Alderson wrote:

...

> We have a crumbling DELUA in the VAX-11/780-5 at LCM. There is a > copy of the *User's* Manual at Bitsavers, but not the *Technical* > Manual. Does anyone happen to have a copy lining the hamster cage > floor, vel sim.?

...

Don't have the manual. I *might* have the diagnostics. I need to check when I'm back in Seattle in a couple of days.

Thanks, Johnny. Matt Burke's note took care of the diagnostics issue.

...

What is "crumbling" about the DELUA?

There has been an ongoing issue with the network service for several years, but we haven't had the time to go after it. (It appeared to be an issue with CMU/IP, and there were not enough software cycles among the 3 people on the museum project at that time.) What happens is the following, taken

...

from the latest console log:

%%%%%%%%%%% OPCOM 17-NOV-2014 20:24:09.56 %%%%%%%%%%% Message from user SYSTEM on ROSIE IPACP: XEDRV: MPBS = 1500 %%%%%%%%%%% OPCOM 17-NOV-2014 20:24:09.60 %%%%%%%%%%% Message from user SYSTEM on ROSIE IPACP: XE (DEC ENET) restarted, count = 3 %%%%%%%%%%% OPCOM 17-NOV-2014 20:24:09.63 %%%%%%%%%%% Message from user SYSTEM on ROSIE IPACP: XE $QIO error (send),RC=0000001C %%%%%%%%%%% OPCOM 17-NOV-2014 20:24:18.56 %%%%%%%%%%% Message from user SYSTEM on ROSIE IPACP: XEDRV: MPBS = 1500 %%%%%%%%%%% OPCOM 17-NOV-2014 20:24:18.60 %%%%%%%%%%% Message from user SYSTEM on ROSIE IPACP: XE (DEC ENET) restarted, count = 4 [snip] %%%%%%%%%%% OPCOM 17-NOV-2014 23:16:10.56 %%%%%%%%%%% Message from user SYSTEM on ROSIE IPACP: XEDRV: MPBS = 1500 %%%%%%%%%%% OPCOM 17-NOV-2014 23:16:10.59 %%%%%%%%%%% Message from user SYSTEM on ROSIE IPACP: XE (DEC ENET) restarted, count = 160 Eventually, no connections to the VAX can be completed. A shutdown and reboot (VMS = WNT) would clear it up for a few days--which made it look like a memory leak or something similar. In the past month and a half, it's gotten more frequent; Friday evening, the system went south (for the Brits, west) after only 2 hours and was unavailable to our users, such as they are, all weekend. This smells much more like a hardware failure than software, so I posted my query about the VAX diagnostic and the tech manual. We also noted that the socketed parts (including the 68000) were all in tin, so tried the obvious and reseated them to crackling sounds like my knees after sitting too long. The system stayed on the net for more than 8 hours, but it still went away again. Clearly, simplistic fixes are not enough. That's what's crumbling. :-/ Rich Rich Alderson Vintage Computing Sr. Systems Engineer Living Computer Museum 2245 1st Avenue S Seattle, WA 98134 mailto:RichA at LivingComputerMuseum.org http://www.LivingComputerMuseum.org/

cctech＠beyondthepale.ie

6:30 p.m.

...

What is "crumbling" about the DELUA?

As far as I know, CMU/IP was a vallient attempt to create a TCP/IP stack that runs exclusively in user mode. I've never known it to work satisfactorily for any length of time in any sort of production environment. Typical problems include processes hanging or exiting for reasons that are difficult to debug and harder to solve. If the version of VMS and available disk space allows, maybe you could install Multinet from Process Software? Even recent versions of Multinet will install on VAX/VMS as far back as V5.0. Other possibilities include TCPWare and TCPIP Services. I have little experience of either of the latter but in my opinion they wouldn't have to do much to beat CMU/IP for reliability, performance, usability and documentation. And I'm sure none of them would make the interactive response slow and painful the way CMU/IP used to either. There is a certain argument for preserving the user experience in a museum environment but maybe there are some experiences that are best not preserved :-)

...

%%%%%%%%%%% OPCOM 17-NOV-2014 20:24:09.63 %%%%%%%%%%% Message from user SYSTEM on ROSIE IPACP: XE $QIO error (send),RC=0000001C

You can generally find out what a VMS status code is telling you by just giving it as argument to an EXIT command at the dollar prompt. Here's the result from OpenVMS Alpha V8.3 - some things just don't change: $ EXIT %X1C %SYSTEM-F-EXQUOTA, process quota exceeded IIRC this was a fairly standard CMU/IP failure mode. Unfortunately, VMS has lots of different process quotas and any one of them could be running out. Prime suspect is pagefile quota (virtual memory). If the process hasn't exited, SHOW PROCESS /QUOTA /ID=<process id of IPACP process> may show what the problem is but my bet is that if the relevant quota is increased, it will run out again.

...

Eventually, no connections to the VAX can be completed. A shutdown and reboot (VMS = WNT) would clear it up for a few days--which made it look like a memory leak or something similar.

I remember having that sort of grief with CMU/IP on a VAX 6410 more than 20 years ago. The problems persisted until it was possible to convince the management to spend the money on a proper TCP/IP stack that ran in kernel mode. That cured it completely and made life so much easier.

...

In the past month and a half, it's gotten more frequent; Friday evening, the system went south (for the Brits, west) after only 2 hours and was unavailable to our users, such as they are, all weekend. This smells much more like a hardware failure than software, so I posted my query about the VAX diagnostic and the tech manual.

I would suspect CMU/IP before the hardware. The increased frequency of the problems may be due to differing conditions on your network. If the network adaptor is really having hardware problems, it will probably be making entries in the error log. Use SHOW ERROR to make a quick check for devices which are clocking up errors and ANALYZE /ERROR_LOG to format the error log in human readable form. HELP ANALYZE should give hints on what command qualifiers to use to select the error log entries of interest. If you've already got all sorts of stuff in the error log that you are not interested in, you can RENAME SYS$ERRORLOG:ERRLOG.SYS ERRLOG.OLD for example and the system will start a new error log the next time it has something to log or in a few minutes if nothing happens. If there is nothing of interest in the renamed error log, you can delete it after the new log is started if it is large and disk space is an issue for example. If your network adaptor is attached to a network transceiver that doesn't have SQE test enabled, you will clock up errors similar to "collision detect carrier check failed" every few seconds. This is highly unlikely to represent a real problem and can be ignored if you can put up with the irritation. Using a tranceiver with SQE test enabled should get rid of it. Regards, Peter Coghlan.

bqt＠update.uu.se

19 Nov 19 Nov

12:18 a.m.

Lots of good tips there, Peter. One thing, though. I don't think that the error code from the $QIO in the OPCOM log is a VMS exit code. But I might be wrong on that. But that could do with some more examining. Things like SQE were things I was also thinking about. Checking the actual system errors logs would be an important first step. Johnny On 2014-11-18 15:30, Peter Coghlan wrote:

...

What is "crumbling" about the DELUA?

%%%%%%%%%%% OPCOM 17-NOV-2014 20:24:09.63 %%%%%%%%%%% Message from user SYSTEM on ROSIE IPACP: XE $QIO error (send),RC=0000001C

Eventually, no connections to the VAX can be completed. A shutdown and reboot (VMS = WNT) would clear it up for a few days--which made it look like a memory leak or something similar.

cctech＠beyondthepale.ie

11:03 a.m.

...

One thing, though. I don't think that the error code from the $QIO in the OPCOM log is a VMS exit code. But I might be wrong on that. But that could do with some more examining.

There is a poorly phrased entry in the CMU/IP FAQ which could give the impression that CMU/IP uses it's own error codes that are entirely different

...

from VMS status codes. What I think it is really trying to say is that like

many VMS applications, CMU/IP defines _additional_ status codes that VMS does not already have suitable messages defined for and the text messages associated with these are not available unless the appropriate CMU/IP provided message files are loaded. Low numbered error codes such as 1C (and another favourite - 0C which is %SYSTEM-F-ACCVIO, access violation) come from system services and runtime library functions that are part of VMS and the message texts are made available automatically by VMS. It is not the case that CMU/IP reporting an error code of 1C means something different to some part of VMS reporting it. They both mean process quota exceeded. Directly underneath that entry in the FAQ, I found the following:

...

3.1.2 >>>> IPACP CRASH DUE TO QUOTA EXCEEDED [20-MAR-1995] For systems with a high IP load, IPACP may occasionally crash with a quota exceeded. This does not refer to disk quota, but to one of the process quota limits. Usually, the quota in question is BYTLM. The default BYTLM provided for IPACP (65536) is sufficient for only about 20 connections. IPACP takes about 32000 for itself and each connection takes

about

...

1872 bytes. This requirement is NOT currently documented. To increase the BYTLM for the IPACP, modify the IP_STARTUP.COM procedure and change the value of the /BUFFER_LIMIT qualifier on the RUN command that starts the IPACP process. Then shut down and restart IPACP. At the current time, there also appears to be a memory leak in IPACP which has the effect of gradually reducing the available BYTLM over time. When this gets close to zero, IPACP will hang (as it retries) and then crash soon afterwards. It is therefore desirable to give IPACP more BYTLM than the typical load might suggest. If this sort of crash is experienced, increase the BYTLM by 50% and restart it. <A.Harper at kcl.ac.uk>

Looks like my pagefile quota guess was wrong and the culprit is BYTLM. However, I suspect the underlying cause of this problem has never been fully addressed and increasing the quota will not help, or worse, will help for about a week before the problem returns even more frequently. I cannot overemphasise how much relief will be experienced on the replacement of CMU/IP by something that works properly or even by something that doesn't mess up as badly. Problems that you didn't even know you had will go away, even ones which seemed unrelated to networking. On sunny days, the sun will seem brighter and the sky bluer :-) In my previous posting, I forgot to mention that you can also try: $ MCR NCP SHOW KNOWN LINE COUNTERS if running DECnet. This will give DECnet's view on any network media problems including those relating to other protocols going through the same network adapter. It probably won't have much to say about hardware failures in the network adapter though. Remember that on a half duplex ethernet, collisions are normal and expected but late collisions indicate a problem. Regards, Peter Coghlan.

isking＠uw.edu

8:50 p.m.

I have to say this about the LCM VAX: it successfully ran CMUIP for a rather long time. The pattern of failures, becoming more and more frequent, seems to conform to a hardware issue rather than one of software. The machine has never had a large load - when I watched it regularly, there were rarely more than a handful of users at any one time. The machine has been power-cycled in response to errors, which would of course reinitialize all transient data structures - and I do not believe that CMUIP uses persistent caches (i.e. cached to disk). Yes, LCM could just load UCX and perhaps whistle a happy tune. It might be an interesting experiment to do so and observe behavior - the changes in the startup script could easily be commented in/out. I'd certainly like to see them continue to use CMUIP for historical reasons. Multinet would also be interesting, if policies have changed at LCM to provide for licensing costs of that software. When I was restoring the machine back in 2008-9, Process Software offered a somewhat amusing 'discount' for an educational institution. -- Ian On Wed, Nov 19, 2014 at 8:03 AM, Peter Coghlan <cctech at beyondthepale.ie> wrote:

...

One thing, though. I don't think that the error code from the $QIO in the OPCOM log is a VMS exit code. But I might be wrong on that. But that could do with some more examining.

There is a poorly phrased entry in the CMU/IP FAQ which could give the impression that CMU/IP uses it's own error codes that are entirely different from VMS status codes. What I think it is really trying to say is that like many VMS applications, CMU/IP defines _additional_ status codes that VMS does not already have suitable messages defined for and the text messages associated with these are not available unless the appropriate CMU/IP provided message files are loaded. Low numbered error codes such as 1C (and another favourite - 0C which is %SYSTEM-F-ACCVIO, access violation) come from system services and runtime library functions that are part of VMS and the message texts are made available automatically by VMS. It is not the case that CMU/IP reporting an error code of 1C means something different to some part of VMS reporting it. They both mean process quota exceeded. Directly underneath that entry in the FAQ, I found the following:

about

-- Ian S. King, MSIS, MSCS Ph.D. Candidate The Information School University of Washington An optimist sees a glass half full. A pessimist sees it half empty. An engineer sees it twice as large as it needs to be.

cctech＠beyondthepale.ie

20 Nov 20 Nov

7:22 a.m.

...

I have this amazing feeling of deja-vu all over again :-) Here, CMU/IP ran ok for ages but then died every few hours for a couple of days. Then it ran ok for another few weeks or months. We blamed the hardware and gave the resident DEC engineer a pain over it. We ran CMU/IP on several machines and that it seemd to die on some but not on others lent weight to the hardware theory but there was nothing wrong with our hardware. (We also blamed network problems, and there were network problems but they were nothing to do with CMU/IP falling over.) Our former colleague who had jumped ship and went to work for an outfit with money had ditched CMU/IP and pleaded with us to see the light and do the same but we persisted in trying to debug CMU/IP or to place the problem elsewhere. He had the same problems convincing us that I am having now. The machines where it wasn't falling over had lousy performance and were painful to use interactively with delayed and jerky echoing and general grief, whether anyone was logged in via CMU/IP or not. All this went away when CMU/IP was removed. It was a revelation to us how smooth and responsive the machines could be. I guess the museum is doing it's job perfectly - it's giving me a vivid recreation of something I went through 20 years ago :-) I hate to pour scorn on free software that people have obviously put a huge amount of effort into producing but it was not ready for prime time 20 years ago and it has not changed since. One of the big strengths VMS has is reliability. CMU/IP puts a serious dent in that. It also defeats the logical approach to problem solving that VMS normally promotes and remains one of the very few VMS applications that requires reboots to untangle it. If ever there was a case for for putting a sofware exhibit in a glass case and looking at it rather than using it, CMU/IP is it.

...

Yes, LCM could just load UCX and perhaps whistle a happy tune. It might be an interesting experiment to do so and observe behavior - the changes in the startup script could easily be commented in/out. I'd certainly like to see them continue to use CMUIP for historical reasons. Multinet would also be interesting, if policies have changed at LCM to provide for licensing costs of that software. When I was restoring the machine back in 2008-9, Process Software offered a somewhat amusing 'discount' for an educational institution. -- Ian

Check the error log first, just in case. I don't know whether UCX is compatible with happy tunes. Early versions did not enjoy a good reputation but maybe it has improved. Still, it's got to be better than CMU/IP and it might exonerate the hardware. It might be worth contacting Process again. Things might have changed over the last five years. Maybe they might be open to a special deal for a contemporary version of Multinet on a museum exhibit if your machine is not also used for more than that? Regard, Peter Coghlan.

3908

days inactive

3911

days old

test-drb@ccmp.vtda.org

Manage subscription

9 comments

6 participants

tags (0)

participants (6)

bqt＠update.uu.se
cctech＠beyondthepale.ie
henk.gooijen＠hotmail.com
isking＠uw.edu
matt＠9track.net
RichA＠LivingComputerMuseum.org