It was thus said that the Great Kevin Handy once stated:
Chuck Guzis wrote:
Never trust an optimization unless you see it for yourself.
Looking at a real-world example of generated code, I wrote
[ some C code to test a loop vs. memset() ]
This generated the following output using O3 level
optimization
[ Assembly code from GCC ]
So, in this case the loop operates inline, while the
memset
version pushes parameters onto the stack, then calls an
external function.
I suspect that the loop version, which generates inline code with
hardwired values, is likely to be slightly faster than the call to
memset, unless memset is doing something very odd.
Well ... how ... disappointing.
I played around with GCC 2.7.2.3 (yes, old) and the code produced wasn't
different at all, and I tried several different varients to the memset()
call (declaring the variable local, reducing the amount set to 4 bytes,
changing the type, etc).
Now, I could understand if the clear routine accepted a char pointer as a
parameter---then you have issues of alignment that the compiler can't make,
but that wasn't the case (since you were calling memset() directly on arrays
which will have proper alignment). The only thing I can think that memset()
might do that is unusual is that on Pentium class machines, using the
floating point hardware to write to memory is faster than using the 32 bit
registers, but I could only see that being used for Pentium specific code
(and even then, only certain generations of the Pentium).
Sigh.
-spc (Maybe they felt that adding compiler support for memset() wasn't
worth the time since most code uses explicit loops?)