On 1 Sep 2009 at 11:49, Frej Drejhammar wrote:
A good reference on how to do this is in Michael
Abrash's Graphics
Programming Black Book, Chapter three[1]. BTW, the whole book is great
reading if you are interested in x86 optimization in the [345]86-era.
I'm going to put my foot into it this time...
Using time-in and time-out counters may be fine for code where the
"hot spots" are known, but in a large complex application, that's
almost never the case. P-counter sampling (using a periodic
interrupt to obtain the value of the P counter and construct a
histogram) can point the problem areas up.
But in all of my own experience, the bit-twiddling sort of
optimization almost never yields results of the hoped-for magnitude.
For example, unrolling loops or scheduling instructions is fine for
compilers with automatic optimizers, but not for the programmer who
is going to hand-optimize. And the timing differences across an
entire family of CPUs often makes this counterproductive. (I can
probably still write out the scheduling for CDC 6600 instructions
from memory, as I've done a lot of this sort of
thing).
What matters most is the algorithm used. Converting 6 digit binary
numbers to decimal by repeatedly subtracting 10 is never going to be
faster, no matter how carefully optimized than using, say, the
Chinese remainder method.
Of course, thinking about one's methods is a lot harder than moving
instuctions around and potentially more disruptive.
For whatever it's worth,
Chuck