Paul Koning wrote:
From:
cctalk-bounces at
classiccmp.org [mailto:cctalk-
bounces at
classiccmp.org] On Behalf Of William Donzelli
Sent: Tuesday, April 07, 2009 5:44 PM
To: General Discussion: On-Topic and Off-Topic Posts
Subject: Re: The lost art (Was: The VAX is running
And too few people who take the trouble to learn
the assembly bother
to learn the machine representation of the instructions--or the
architecture of the implementation. Even fewer learn how to time
instruction execution--perhaps it's no longer relevant.
All of this is now the job of the compiler.
It ain't 1980 anymore.
True. But it remains the case that there are tasks that push the limits
of what the available processors can do. Or there may be special
considerations that you simply can't tell the compiler -- either it
doesn't know, or you can't say it in C.
I continue to do assembly language programming, occasionally. It's a
small fraction of the total code but it has to be done that way for one
reason or another; the compiler just isn't an option.
For example, I have some cache flush code that runs about 10x the speed
of C code. And it works correctly, which the C code can't because the
hardware has some very odd requirements that no compiler ever written
can cope with.
Where a compiler also has a problem is when using the extra speed
available from
taking advantage of L1 and even L2 cache speeds - as opposed to the
speed available
from using only RAM.
For example, I have an inner loop which is very fast when it uses L1
cache for the data.
This inner loop initially executes very well and very efficiently so
long as the advance
increment is sufficiently small to take advantage of the L1 cache size.
But when the
advance increment reaches a certain size (which is too large for the L1
cache), then
the inner loop range must be modified to take advantage of L2 cache.
Eventually,
even L2 cache is too small and actual RAM must be used.
No compiler will ever be able to support the various sizes of L1 and L2
cache on
different CPUs, so this application will only be able to reach maximum
efficiency
using code which is optimized to take advantage of the characteristics
displayed
by the relative speeds achieved by L1, L2 and RAM. In addition, with
the newest
CPUs just out, L3 cache will likely add an additional layer of complexity.
This does not mean that all coding problems require this level of
sophistication.
In some cases, a compiler is the best solution. And for code which is
run only
once, the efficiency often does not matter since just getting a
non-trivial program
to work takes more time than any optimization can ever gain. So there
are always
going to be many programs, maybe even most, where optimization is not
relevant.
But there will also be a few programs where optimization is essential
and in some
cases is the actual point of the program in the first place.
Jerome Fine