woodelf wrote:
Now what I still want to know is a real way to
calculate the speed
of the 8086 other than it often a lot slower than the best timing
they give in the opcode tables.
Aha! Finally a question I can most definitely answer!
For the 8088, and its implementation in the IBM PC 5150 and compatbiles, the
timings in the opcode tables are indeed correct -- if the instructions are
already in the prefetch queue. The 8088 only has a prefetch queue of 4 bytes,
so if your instructions aren't already prefetched it will take 4 cycles per
byte for the Bus Interface Unit to fetch them. For example, "POP reg" is
listed as taking 8 cycles, but if it's NOT been fetched it takes an additional
4 cycles to read the opcode itself. So the total time if not prefetched is
actually 12 cycles.
That's pretty much it, and that's why you can optimize for 8088 just through
visual inspection 90% of the time. For example, REP MOVS and MUL take long
enough to execute that you can be sure the prefetch queue is full after they're
finished -- the Bus Interface Unit runs independently of the Execution Unit so
it can be fetching while the instruction is already in progress. The other
10%, you should time your code using something like the Zen timer.
The 8086 has two improvements over the 8088 in this area:
- The prefetch queue is now 6 bytes long
- It can be filled twice as fast (8086 can access *two* bytes in 4 cycles, as
opposed to just one byte)
...so that's why, even at the same clock speed, the 8086 is about 30% more
efficient than the 8088. My 8086 clone ran at 7.16MHz and, if you go by MHz
alone, should have been 66% faster than my 5150; it turned out to be about 90%
faster in benchmarks.
Remember, there was no L1 or L2 cache on IBM PC platforms motherboards until
higher-speed 386s around 1990.
--
Jim Leonard (trixter at
oldskool.org)
http://www.oldskool.org/
Want to help an ambitious games project?
http://www.mobygames.com/
Or check out some trippy MindCandy at
http://www.mindcandydvd.com/