On 21 Oct 2006 at 16:00, Ray Arachelian wrote:
On an interrupt, you need to switch from userland code
to supervisor code to handle the interrupts.
You can delay processing by having a small ISR which does very little
work and schedules something else to run, but in the end, at some point,
you need to switch from one process to another, and doing so requires
saving all of the registers. With a stack architecture, you need to
flush the stack cache and save only 3-4 registers, (PC, SP, SR) and they
can be saved into the process table itself, making context switches cheaper.
You pay the price with context switches regardless at some point.
You'll eat up cycles flushing the cache or storing registers, your
choice.
I've
worked with 3-address machines with 256 registers of 64 bits.
If you're working with even a moderately-sized subroutine under those
circumstances, the entire set of local variables will usually fit in
the register file A smart compiler or programmer can even segment
the variable set out into dynamic and static variables, so the
smallest part of the register set is saved upon exit.
That depends on your code and your optimizer. If you have a lot of
function calls that the optimizer can't inline, it won't be as efficient
as a stack machine. If you have a big blob of code in a single function
that does a lot of floating point, or even integer math, then a stack
machine will lose. Also, does your 3-address machine have a way of
indexing into registers such that r0 is one register, but if you make a
subroutine/function call, r0 is another register?
It's called global optimization. Real compilers do it.
Either that, or your compiler has to inline a lot of
the commonly used
functions and assign them different registers from the file. (i.e. it
has to treat as if function calls were to nested functions and inline them.)
The point is, that no matter how one touts the
benefits of a cache as
being able to substitute for a fast register file, it's a false
claim. A cache can never have the information about the nature of a
program's behavior that the programmer or compiler can--it's too far
removed from the actual program context and must rely on history for
what belongs in the cache and what doesn't.
The stack cache doesn't need any of that information though. That's
what's nice about the register windowed and stack machines: It also
saves the compiler a lot of headaches in register scheduling and
optimizing code. Why would it be any faster if it did? I can see how
temporary values that are discarded would wastefully get written back to
main memory, but beyond that, its not slower than a large register file.
If it's truly a cache, and not just a queue for the stack, it has to
use some algorithm based on history to determine what's kept in fast
memory and what's not.
Any time spent in compilation is one-off. Who cares how long it
takes in the real world? Loop unrolling, register coloring, global
optimization, variable renaming, vectorization (your stack machine
works with vectors, doesn't it?) are all compile time overhead that
no one really cares about, other than perhaps a few programmers.
And that was the fatal weakness of the B5000 series. When it came
down to crunching data, no one really cared if Algol-60 was the
system language and the machine didn't have an assembler.
The question is "Aside from the claims of elegance and similicity,
how fast can you run Linpack?" To date, I've never seen a stack
machine that can beat a traditional multi-address register
architecture at that game. Have you?
Stack machines have been with us at least since the early 60's.
There must be a reason that when it comes to applications needing raw
performance that no manufacturer adopts the model.
Cheers,
Chuck