Stack machines ARE fast! (was Re: Stack Machines)

23 Oct 2006

Chuck Guzis wrote:
...
  Define "well designed"; define
"similar".  I'm not at all convinced.
    "Well that depends on your definition of 'is'  "  :-)  Dude!
seriously!
...
  But, what defines "performance"?  How about
"work performed in a 
 given period of time"?  Could we agree that two machines with roughly 
 the same gate count applied in the same way and speed and clock rate 
 are "similar"?  But I doubt that two such examples even exist.
    And how does Sun compare itself to IBM, and Intel compare itself to
PowerPC, and so on?  Why benchmarks of course.  Is one supercomputer
faster than another?  Does it outperform another?  Everyone other than
yourself would say, yes, there is a way to compare two different
machines, even if they don't have the same gate count, even if they
don't have the same clock speed. 

To have fair benchmarks, you want the systems as close as possible, but
what you're asking of me in order to prove to you that stack machines
are useful is you is  just inches beyond the insane. 

Note that I'm not calling you insane, rather I'm pointing out that your
debating technique is underhanded and unfair.  No matter what I throw
your way, you'll find some reason to be unconvinced.
...
  So to "outperform" would be one such machine
producing more useful 
 results per unit time than another, no?
    Let me play your game by your rules then:

Define "useful." 

No, wait, don't bother. I'm sure you'll come up with some new moveable
benchmark that no one will be able to meet to your satisfaction.  Even
if I were to build you two supercomputers, both with nearly identical in
every way except for the register file vs stack machine difference that
we're howling at the moon over, you'd find some way to be unconvinced. 
...
  Where does the function call fit into this picture of
getting 
 results?   Answer:  it doesn't.  By and large it's a bit of semantic 
 fluff--an attempt to re-use code that otherwise might be inlined.  
 Unnecessary in the grand scheme of things--and a hindrance to 
 performance and code optimization.
    Reality calling: it fits.

You've no idea how many CPU cycles are wasted on function calls.  Not
actual work being done, but just waiting around to shove data on to the
stack and then shoveling back and forth and back and forth between calls.

If you actually do care about whether this matters or not, take a real
program, run it under a debugger with tracing turned all the way into
the OS itself and back to the program on and see just how expensive
function calls really are even for simple things such as outputting a
character out on a serial port because of all the layers between your
application.  The libraries, the kernel calls, the drivers, and finally
the actual hardware.

To take your statements to their logical conclusions, you'd have to live
in a world where libraries, function calls, classes, methods, and
objects and even operating systems do not exist, and code would be
gigantic blobs of spaghetti code on machines with infinite enough RAM to
inline everything.  That's not how things are done.  Even on supercomputers.

Clue: every function call, every method call, every context switch
between one thread and another, or between one process and another,
between userland and kernel code, every one of those guys requires state
to be stored onto the stack.  That's where a stack machine accelerates
those calls.

In the real world compilers have to optimize code.  In the real world
not every function can or should be inlined.  In the real world function
calls are not free.  In the real world stacks are not infinite and
access to memory is much slower than access to L1 or L2 cache.

In the real world programmers call libraries, they do not reinvent the
wheel.  In the real world libraries call the kernel, and the kernel
calls the drivers.  To ignore that would be like measuring a straight
line on a map and saying The Distance from NYC to Boston is 150 miles,
ignoring that you cannot possibly get there in a straight line even if
you were to fly there.

...
  While a stack architecture *might* be better than
others in the area 
 of function calls, I'm not at all convinced that it is. The 
 comparison just has too many subjective aspects.
    Yeah, sure, right, and for vector math a vector unit"*might*" be
better,
and for floating point intensive code, an FPU "*might*" be better
according to you.  Uh huh.

Ask yourself this, if call operations are unimportant semantic "fluff",
since according to you, they're not important to the results, why bother
with inlining code?  After all, it must be that function calls are
useless, right? so they must be free, right?  In that case, why should a
compiler inline code?  Maybe, because doing so speeds things up?  But
why? Is it because function calls (and passing of parameters) are
expensive?  If so, why?  Is it because of shoving all the operands to
the THE STACK and then reading them back off?

Why would Sun waste all that silicon on implementing register windows if
they don't do anything useful, if they didn't speed up something? 

Maybe, just maybe, it is because function calls really ARE expensive!

I agree with you that they don't do any real useful work in the sense of
benchmarks.  I disagree with you that they aren't useful or that they're
fluff.

Since you're not convinced, why wouldn't a stack machine be faster at
function calls?

I, and Sun, and AT&T, and designers of past stack machines would all say
that since function calls are expensive, finding a way to deal with them
would be a good thing.  Would register windows speed up function calls? 
Sun seems to think so!  Would a stack architecture speed up function
calls?  AT&T thought so, as did many others!  So far only you don't.

...
  Almost any instruction is a bit of fluff in the grand
view--as long 
 as you have at least one to execute.  I've certainly used machines 
 without stacks or CALL instructions and never really missed them for 
 felt that they would make a substantial contribution to performance.
    Right, one instruction here, one instruction there, it doesn't matter. 
One bunch of memory accesses here, another there, no matter.  One
software floating point math call here, and one there, who needs an FPU? :-)

You personally might have not missed stacks on machines lacking them,
but what exactly did a C, C++, Pascal, Modula, or Java compiler have to
do in order to work on such machines?  Did it have to emulate stacks? 
Did it have to emulate CALL instructions?  Were the opcodes emitted in
order to emulate all of those operations a lot more expensive than a
simple call would have been?   Was the code a lot larger because it had
to build all that stuff where a simple call would have sufficed?  I
think we both know the answers to those questions.

But of course, none of that matters to you, because you'll just refuse
to be convinced.  Anything not to agree with me, right? :-) Reminds me
of the "invincible" knight in Monty Python's Quest for the Holy Grail. 
Once armless and legless, he still insisted "Tis but a scratch!"

I'm sorry, I must apologize.  I tried.  I tried to resist replying, but
it was too much fun pushing your buttons and watching you come up with
more unrealistic constructions!  This was just too delicious!  :-D

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Stack machines ARE fast! (was Re: Stack Machines)