Michael B. Brutman wrote:
I have been performance tuning my TCP/IP stack mostly by looking at the
code and making educated guesses as to what needs to be fixed. I have
just wasted a day rewriting the IP checksum routine in assembler for a
few tenths of a percent improvement and I'm not terribly happy. :-)
#1 assembler optimization rule for old DOS apps: Keep everything in
registers. Memory accesses are really slow for everything 386 and
below. If you have to pass data to other blocks of code, try to do so
via registers instead of the stack.
#2 assembler optimization rule: inline code with macros where it makes
sense. A CALL on 8088 is 70 cycles (less penalty the more modern the
CPU gets, but still), plus it's that many less JMPs/RETs/etc.
Other helpful hints include:
- Smaller code is usually faster (see the "memory reads suck" rule)
- JMPs suck (they empty prefetch queue and flush caches) so try to make
the most common case in a comparison-and-jump the "fall-through" case
Is there a sampling profiler available that will
periodically record the
program counter and give me a histogram of what it finds when it is
done? I'm thinking of something like oprofile on Linux.
Turbo Profiler, although I've never tried profiling a TSR.
You can profile sections of code yourself by reading the 8253 timer both
before and after a block executes, then subtracting. Mail me if you'd
like some example code. The timer counts from 65535 down to 0 about
18.2 times every second; if the code you're profiling executes in less
than 55ms, then you can do your own microsecond-accurate timing.
--
Jim Leonard (trixter at
oldskool.org)
http://www.oldskool.org/
Help our electronic games project:
http://www.mobygames.com/
Or check out some trippy MindCandy at
http://www.mindcandydvd.com/
A child borne of the home computer wars:
http://trixter.wordpress.com/