On May 5, 2013, at 1:05 PM, Peter Coghlan wrote:
Do the SHOW ERROR anytime and if anything (other
than a tape drive,
terminal or any sort of virtual device) shows a non-zero error count,
investigate the reason for it. Particularly watch for MEMORY or CPU
errors. If your disk device starts clocking errors just before the
system stalls, that would tend to point the finger in its direction.
Ah, OK. So I get some PUA0: errors ticking up every so often, most of
which are right after the system has had some "hiccup" of non-
interactivity for several minutes. Some initial research indicates that
the PU device is the UDA controller, which is what I believe the CQD-220
masquerades as (more or less). Right now, after three hours of sitting
basically idle, it's logged 26 errors.
So I'll have to check that out. It could be something as simple as
balky SCSI termination, since I do have 4 devices in the chain and the
terminator is a Jaz drive using internal termination. I'll play with it
some. Hopefully it's not the hard drive going bad, since my supply of
SCSI drives is essentially dry. It's a bit worrisome that it's the PU
device going bad rather than the DU device, but I'm not 100% sure how
VMS logs the errors, so it could be what I suspect.
How long is the SCSI chain? The first thing that came to mind for me was
the possibility of a SCSI bus issue since I've seen similar behavior with
other systems. In fact, since you just mentioned the Jaz drive, it could
very well be the culprit. I had major compatibility issues with Jaz drives
in a non-PC application back when they were current products and I ended
up having to connect them to a PC to update their firmware and change
their internal settings. A quick Google search turned up this link too: