Chuck Guzis wrote:
  The same sort of behavior exhibits itself
 on some of the x86 CPUs--where:
 l1:
     mov        al,ds:[si]
     dec        cx
     mov        es:[di],al
     jnz                l1
 runs about as fast as rep movsb (and, in cases, somewhat faster).  On
 those same CPUs, replacing the dec cx/jnz with a loop instruction
 will actually slow things down. 
Could you clarify which 808x CPUs in particular you're referencing in
the above?  There isn't a single case from 808x to 486 where that code
will beat rep movsb.  Were you referring to Pentiums and higher?
--
Jim Leonard (trixter at 
oldskool.org)            
http://www.oldskool.org/
Help our electronic games project:           
http://www.mobygames.com/
Or check out some trippy MindCandy at     
http://www.mindcandydvd.com/
A child borne of the home computer wars: 
http://trixter.wordpress.com/