Jim wrote:
In 1983, Metheus corp, did the classic 68k X 2 cpus to
fix the double
bus fault problem on their VLSI design workstation.
I never could figure out how you could get the system into a double page
data/code/stack fault in the exception handler, but it's a known
problem.
The MC68000 didn't have a double bus fault problem. It had a *single*
bus fault problem!
On either a bus error or an address error exception, the information
the processor puts in the exception stack frame is not sufficient to
either resume or undo the faulted instruction. [*] These errors were
considered fatal, and the typical "recovery" was to kill the process
that faulted. Obviously this is not conducive to implementation of
virtual memory.
What some companies did, perhaps including Metheus, was to use two
MC68000 processors. When a page fault occurs, rather than signalling
a bus error to the running processor, the hardware "suspends" that
processor (disconnects it from the bus, but leaves it waiting for the
memory cycle to complete). The hardware enables the other processor,
which runs the page fault hander. When the fault handler completes,
the hardware switches back to the first processor, which can then
complete the cycle.
The MC68010 was introduced largely to fix this problem, though it had
a few other enhancements as well. In the MC68010 (and MC68012,
MC68020, MC68030, and CPU32), on a bus error or address error the
processor pushes enough of its internal state onto the stack that an
page fault handler can determine exactly what memory access had an error,
fix the problem (e.g., by bringing in a page from disk and updating
the page tables), and resume the faulted instruction from where it
left off.
The bus error stack frame was affectionately known as the "stack puke".
On the MC68010, this occupied 29 16-bit words, although only 26 of them
were written, and 3 were reserved. About 16 words of it were
undocumented internal state that the software was not supposed to
touch.
On the M68K family processors, if a bus error occurred while the
processor was already handling a bus error, that was a double bus
fault and would cause a processor halt. This would normally only
happen due to a software bug, such as a failure to allocate enough
pages for the supervisor stack, and there was no means of recovery
other than a hardware reset. In principle a multiple processor
hack as described above could be used to allow recovery from this
condition, but there wasn't really any reason to do so.
Eric
[*] If you write software such that only specific, known instructions
can generate bus faults, you can recover from them. This technique
is called probing, and is used in the Apple Lisa operating system.