On 4 Feb 2007 at 0:00, Jim Leonard wrote:
Okay, just making sure. On 486, it was break-even; on
Pentium, it was
indeed faster for most cases to pair instructions. (Although, if you
wanted to get fancy, you could copy memory even faster by loading 64
bits at a time into the FPU registers, then storing them out...)
Yeah, but then you wouldn't have code that would run on *any* x86.
To be fair, however, in a bunch of my code, I did have a routine that
started out:
mov Is_x86,0 ; assume neither 286 nor 386
push sp
pop ax
cmp ax,sp ; see if the same
jne Init2 ; if not 2/386
inc Is_x86 ; it's at least a 286
.486
sgdt scratch ; stash the GDT
.8086
mov al,byte ptr scratch+5
test al,al
jnz Init2
inc Is_x86 ; at least a 386
Init2:
so I could make use of those 32 bit registers for moving data around
and calculating CRCs and such.
That's one of the awful inequities of "improved" CPUs; something like
processor social economics. A trick used to exploit a faster CPU
will often run more slowly on the lower CPU than if you had left in
the non-tricky code. So the folks with older CPUs sometimes get
penalized additionally through no fault of their own.
On the CDC 6000 machines it was like that. The "Count the one's"
instruction on the fast 6600 ran in something like 8 cycles, while on
the slower 6400, it took 68 (slower than even a floating divide). So
a clever algorithm that used the pop count instruction on the 6600
really smoked; but on the 6400, it could make the speed disparity
between the two systems look much worse.
Cheers,
Chuck