Recall that the other day, I was seeing this on running AJRLHA.DG:
STATE NOT 5 AFTER SEEK WITH 0 DIFFERENCE (this error and register dump
prints twice in succession. State 5 = Lock-on, keeping on track).
WD1 0317 (lower head, heads not out, spin-down)
WD2 0204 (write data error, write gate error). This one worries me. WDE is
"write gate on but no transitions on write data line". The drive should
never be trying to write when there is no data being sent, and this
diagnostic program does not do any writes to disk! So a fault, head
retraction and shutdown is the proper response to this fatal error. I'll
have to put a logic analyzer on the appropriate bits and see why it thinks
it's supposed to be writing... or where the command came from.
ERROR FLAG SET
ER 2000 (Operation incomplete within 200 ms). Probably because the drive
shut down when the WDE error occurred.
CB 0003 (Seek)
Today the fault immediately on starting the test is still present (and yes,
Henk, it did occur to me that there might be something wrong with the
diagnostic because all the others work! Has anyone got the source code for
AJRLHA?)
However, there are different initial errors today!
Diagnostic starts
(Fault light comes on immediately):
Prints this block twice:
Error flag set
ER 2002 (Operation incomplete, Drive error)
CB 0103 (12-bit data mode, Seek state). <--- This seems to be OK, 8-bit mode
is required for Maintenance, Get Status or Read Header, but not Seek
command)
then Error flag set
ER 2000, CB 0103; (same as above but no Drive error)
Fault light goes off;
then ER not as expected but error flag not set:
WD1 0235 (Heads out, upper head locked on track)
WD2 0002 (volume check bit)
ER 0003 (Drive error; Drive ready);
WD1 0335 (same except lower head)
WD2 0002 (volume check bit)
ER 0003 (Drive error; Drive ready);
Per the user's manual, the Fault light only comes on with the following
errors:
1 Drive-select error,
2 Seek time-out error,
3 Write current in heads (during sector time) error,
4 Loss of system clock (this condition is not latched and not represented in
status word),
5 Write-protect error,
6 Write data error,
7 Spin error.
I am confident that the reported fault is not 1, 4, 5 or 7. The drive is
being selected properly, works except on initial test, the write protect
switch is not set, and the drive stays spinning with Ready light on when not
being accessed.
However, that still leaves a seek time-out (reported today) or a write error
(seen two days ago) when there shouldn't *be* any writes.
I really want to find out why the drive previously thought it was being told
to write, at the wrong time.
I just had a new idea:
What if a command register is being corrupted between the setting by the
program, and the drive electronics?
Say a Write Data command (CB xxx5) is erroneously received by the drive but
the proper registers for a write have not been set up. That would Fault the
drive and the diagnostic would report an error.
Another example: The diagnostic is issuing the proper Seek command (CB
xxx3), but the drive is actually receiving something else, so the expected
seek would time-out in the diagnostic, and depending on the command the
drive actually is reading, could light the Fault too .
When attempting to run Dumprest for RL during the previous session, I had to
add retries for seeking because the program would halt with a seek error
there too.
So I'm now suspicious of an intermittent or partial short (another whisker?)
between the command registers and the drive. Maybe it's not "hearing" the
controller properly! It's even got the correct expensive DEC cables between
the card and the drives, and a terminator on the farthest drive.
Jon wrote:
I'm pretty far away from competence on PDP-8s
anymore, but the symptoms
sound like maybe the drive faults on LONG seeks, but as long as the seeks
are short, it works OK. There might be a one-shot in the controller that
allows so many ms for a seek to complete, and due to aging capacitors, the
delay is now too short. But, that's a totally wild guess, there could be
troubles in the drive seek electronics that only occur on longer seeks.
That's an interesting SWAG, thanks :)
I checked the 22 uf capacitor (and 39K resistor) that provide the timeout
delay. They are OK. If anything the 22 uf is well on the high side, thus
giving a longer delay.
Time to toggle in some more programs I guess.
What really bugs me is that this whole system was completely working for
years... up until it didn't :P