On May 5, 2013, at 1:05 PM, Peter Coghlan wrote:
Do the SHOW ERROR anytime and if anything (other than
a tape drive, terminal
or any sort of virtual device) shows a non-zero error count, investigate the
reason for it. Particularly watch for MEMORY or CPU errors. If your disk
device starts clocking errors just before the system stalls, that would tend
to point the finger in its direction.
Ah, OK. So I get some PUA0: errors ticking up every so often, most
of which are right after the system has had some "hiccup" of non-
interactivity for several minutes. Some initial research indicates
that the PU device is the UDA controller, which is what I believe
the CQD-220 masquerades as (more or less). Right now, after three
hours of sitting basically idle, it's logged 26 errors.
So I'll have to check that out. It could be something as simple as
balky SCSI termination, since I do have 4 devices in the chain and
the terminator is a Jaz drive using internal termination. I'll
play with it some. Hopefully it's not the hard drive going bad,
since my supply of SCSI drives is essentially dry. It's a bit
worrisome that it's the PU device going bad rather than the DU
device, but I'm not 100% sure how VMS logs the errors, so it could
be what I suspect.
- Dave