Compiler optimization (was: 8086 (was Re: more talking to the

14 Nov 2003

It was thus said that the Great Guy Sotomayor once stated:
...

 The big problem is the code expansion, but some compilers (at suitably
 high optimization levels) will do loop unrolling.  To see why compilers
 don't do this take a look at the implementation of memcpy in glibc.
 There are *a lot* of cases to handle.  Most of the time the compiler
 writers just let the runtime/intrinsics deal with those cases rather
 than trying to figure out how to make the code generator "do the right
 thing". 
  Which is why for copying, setting, or otherwise manipulating memory (in C)
I use the ANSI C functions memmove(), memcpy() and memset() (or the string
equivilents) above rolling my own routines to do the same.  The compiler can
then "see" what I'm trying to do and figure out what's going on, and I
almost never try to second guess the compiler (I write the code, then if
it's important, generate the assembly to see what the compiler is doing).
It also helps to know the language.  For instance:
        struct foo a;
        struct foo b;
        b = a;  /* Legal under ANSI C! */
  GCC can assume (since it's the one doing the compiling) that both a and b
are aligned for best access, and it can also pad out struct foo to be a
multiple of, say 4 bytes (on a 32-bit system) so the assignment (and yes,
you can now do structure assignment in ANSI C) can be done with a "rep
movsl" (which it did when I tested it).  I could have done:
        struct foo a;
        struct foo b;
        memcpy(b,a,sizeof(struct foo));
and GCC would have probably done the same sequence, since memcpy() is
defined by ANSI C, the compiler has some leeway to consider the parameters
to memcpy() (in this case, a and b are aligned, the size follows a certain
convention so inline it with a "rep movsl").  I like the first sequence
though, since the intent is a bit more clear.
  But yes, the generalized memcpy() in glibc is rather complicated, trying
to figure out (at run time) the best way to copy memory (I seem to recall
discussion about this several years ago in comp.lang.asm.x86 (I think that's
the group---it's been a while) that with Pentium class machines, you may get
better performance using the floating point registers to copy memory).
...
  Actually most compilers still do a pretty poor job of
optimization and
 the windows they use for peep-hole optimization appears to be way too
 small to do anything really useful. 
  I know the IRIX C compiler can do global optimizations but that takes
quite a bit of time and processing power; I never bothered with it when I
was programming under IRIX.  And this was at least 10 years ago so it's
margially on topic here 8-P
  -spc (been programming in ANSI C for over ten years ... )

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Compiler optimization (was: 8086 (was Re: more talking to the