The lost art (Was: The VAX is running

7 Apr 2009

...
 Paul Koning wrote: 
...
  From:
cctalk-bounces at classiccmp.org [mailto:cctalk-
bounces at classiccmp.org] On Behalf Of William Donzelli
Sent: Tuesday, April 07, 2009 5:44 PM
To: General Discussion: On-Topic and Off-Topic Posts
Subject: Re: The lost art (Was: The VAX is running

 And too few people who take the trouble to learn
the assembly bother
to learn the machine representation of the instructions--or the
architecture of the implementation. Even fewer learn how to time
instruction execution--perhaps it's no longer relevant.

 All of this is now the job of the compiler.

It ain't 1980 anymore.

True.  But it remains the case that there are tasks that push the limits
of what the available processors can do.  Or there may be special
considerations that you simply can't tell the compiler -- either it
doesn't know, or you can't say it in C.

I continue to do assembly language programming, occasionally.  It's a
small fraction of the total code but it has to be done that way for one
reason or another; the compiler just isn't an option.

For example, I have some cache flush code that runs about 10x the speed
of C code.  And it works correctly, which the C code can't because the
hardware has some very odd requirements that no compiler ever written
can cope with.

 Where a compiler also has a problem is when using the extra speed 
available from
taking advantage of L1 and even L2 cache speeds - as opposed to the 
speed available
...
 from using only RAM. 
For example, I have an inner loop which is very fast when it uses L1 
cache for the data.
This inner loop initially executes very well and very efficiently so 
long as the advance
increment is sufficiently small to take advantage of the L1 cache size.  
But when the
advance increment reaches a certain size (which is too large for the L1 
cache), then
the inner loop range must be modified to take advantage of L2 cache.  
Eventually,
even L2 cache is too small and actual RAM must be used.

No compiler will ever be able to support the various sizes of L1 and L2 
cache on
different CPUs, so this application will only be able to reach maximum 
efficiency
using code which is optimized to take advantage of the characteristics 
displayed
by the relative speeds achieved by L1, L2 and RAM.  In addition, with 
the newest
CPUs just out, L3 cache will likely add an additional layer of complexity.

This does not mean that all coding problems require this level of 
sophistication.
In some cases, a compiler is the best solution.  And for code which is 
run only
once, the efficiency often does not matter since just getting a 
non-trivial program
to work takes more time than any optimization can ever gain.  So there 
are always
going to be many programs, maybe even most, where optimization is not 
relevant.

But there will also be a few programs where optimization is essential 
and in some
cases is the actual point of the program in the first place.

Jerome Fine

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

The lost art (Was: The VAX is running