NOW YOU"VE DONE IT!!! I actually went down into the "pit" and found a
relevant data book. In this case it's the 1981 Zilog Data Book. The
portion of the first chapter, which deals with the CPU, has timing diagrams
on pages 18-20, dealing with the opcode fetch (M1)cycle, memory cycles, and
I/O cycles. The latter two types show both a read and a write cycle. From
the timing diagrams and text it is clear that unless one is concerned with
the minutiae of how the external resources are accessed, the only things
which need to be accounted for in evaluating the execution time of a series
of instructions is how many T-states or clock ticks are required for each
cycle type. Without wait states, an M1 CYCLE is ALWAYS 4 clock ticks long,
a memory read or write CYCLE is 3 clock ticks long, and an I/O read or write
CYCLE is 4 clock ticks.
It would be beneficial if we kept this discussion restricted to the two
processors at hand, i.e. the Z80 and (if you like) the MOSTEK and NEC
"equivalents" which at least claimed to exhibit the same timing
characteristics, and the NMOS 6502's in their various speed grades as were
available in 1980 more or less. There's no question that this doesn't
include the Z8 or the Z180 or the Z280 or any other part, nor does it apply
directly to the various other parts which later inherited the 'Z' or some of
the genuine Z80 characteristics. There's no doubt the Zilog folks, clever
as they were, figured out ways in which to improve their products in later
generations.
I thought I pretty well covered why it's not relevant how long the bird dips
his beak into the memory, so long as you remain mindful of how often,
maximally, it can do so. Your statement indicates an awareness of this, but
both your statement and your asserted conclusions suggest you've confused
the memory access cycle length for the memory cycle rate or frequency. For
purposes of clarity I've postulated that there are two parts to every cycle.
There are other ways of looking at this, but . . . There's the setup phase
and the "active" (for want of a better term) phase. The Z80 active phase
cannot be detected externally in the case of instructions which don't access
external resources. On a 6502 the active phase is defined by the
positive-going (high) portion of the Phase-2 clock. On the Z-80 an active
phase can be detected only when it involves external resources. I'd really
appreciate you referring me to a page in a data book current at the time
which makes any suggestion at all that the Z80 requires any fewer than 3
t-states to execute a non-M1-memory cycle. Even in an LDIR, where there are
NO opcode fetches at all once the process begins, each and every memory
access takes a minimum of three clock ticks or "t-states." This is true
whether the processor is clocked at its maximum or at a much slower rate.
This tells us that executing an absolute jump takes the duration of the
instruction fetch which means the opcode plus the two byte target address.
The diagram clearly shows four normal t-states plus the requisite wait-state
commonly inserted in M1 cycles, so, (a) we now have it straight from the
horse's mouth that the cycle with the one wait state, as your Ampro Little
Board undoubtedly allowed you to observe with your calibrated logic
analyzer, the M1 cycle takes a total of 5 t-states in this, the customary
implementation, and the two subsequent fetch cycles take three t-states
each.
So, the absolute jump takes minimally 4+3+3 clock ticks and, as typically
implemented, 5+3+3 (=11) clock ticks, not 12 as I estimated, to fetch the
entire instruction. The next cycle will ostensibly be an M1 cycle at the
16-bit address fetched as part of the absolute jump instruction. At the
very familiar 4 MHz, this is 2.75 microseconds. On the 6502, an opcode
fetch takes 1 clock tick. A fetch from memory takes 1 clock tick, and a
write to memory takes 1 clock tick. Consequently an absolute jump takes
three clock ticks. At 2 MHz, this is 1.5 microseconds.
You're right, the 6502 takes two clock ticks to execute any of the
"single-cycle" instructons. However, due to the designers' clever use of
simple pipelining, it executes two of them in three clock ticks and three of
them in four, as the execution of each wholly internal "single-cycle"
operation is overlapped with the next opcode fetch, so if you want to argue,
it takes only one, with the exception of the fact the first one takes two.
Its opcode was fetched during the last cycle of the previous instruction,
though, unless it was reached via jump, branch, call or interrupt. What
this means is that when you have a succession of ALU operations, e.g, a
sequence of shifts, the Z80 should win, right? . . . as it takes only five
clock ticks each (with the requisite wait-state) to fetch and execute four
successive left shifts, or 20 clock ticks which equals 5 microseconds. On
the 6502, because of the pipelining, it takes 5 clock ticks, which, in the
case of the 2 MHz processor, is 2.5 microseconds.
Please take a look at the additional comments I've embedded in the quoted
email below.
Dick
-----Original Message-----
From: Allison J Parent <allisonp(a)world.std.com>
To: Discussion re-collecting of classic computers
<classiccmp(a)u.washington.edu>
Date: Tuesday, April 13, 1999 7:39 PM
Subject: Re: z80 timing... 6502 timing
<the relatively short memory access strobe, while I
was talking about the
<frequency at which they occur, as defined in the spec. I agree completely
Yes so? Often the z80 is moving 16bits, with 8bit wide memory it's going
to take several cycles. If it were a z280 that would be even more biased
as it uses fewer "ticks" per cycle and the bus is 16bits wide. Counting
ticks or whatever as I've repeatedly stated meaningless save for
discussions of how memory is used and not who is faster.
An aside at this point, the z280 runs different cycle timing as @4mhz would
be the base z80 of the same speed and the z380 (in z80 native mode) beats
that as the cycles have been shortend again.
This is irrelevant to the comparison between the processors in the title.
<personal. The fact remains, that the memory CYCLE
is three clock ticks
<long, as defined in the spec (though I haven't looked at it in 15 years or
It is not and Like I said the spec is infront of me as I type. Worst case
its 2. But that in itself is again meaningless.
please, Please, PLEASE take another look. Remember it's the CYCLE length,
not the access strobe width that's relevant to this discussion.
<so since I haven't yet unearthed my Zilog or
Mostek data books) and if you
<look at the pictures you saw with your logic analyzer, you should have see
<two read pulses of whatever lenght they were, spaced at very nearly 750 ns
<each time you saw the execution of an absolute jump, or any other
<instruction which consists of an opcode followed by a 16-bit address. The
<same is true of writes. They take one memory cycle, which is three clock
<ticks long, for each byte, although the memory write strobe is a mite
<shorter than the read strobe, IIRC, which I might not, but . . .
Your memory is faulty. and that 750ns bumber is still meaningless. they
only number for comparative purposes is the amount of time it takes to do
an absolute jump. For Z80 @4mhz that will be 2.5us. It will require
memory in the 250ns range to do it.
I've frequently demonstrated, of late, that my formerly steel-trap-mind has
moved significantly in the direction of the polyethylene colander. However,
I did still remembered the details I pointed out yesterday, with the
exception that I thought the minimal number of T-states in the M1 cycle was
five rather than four. . . but I did remember on which pages the diagrams
were printed.
<were inserted as they often were for M1 cycles.
Nevertheless, commonly
use
<instructions were MUCH faster on the 2 MHz 6502,
than on the 4 MHz Z-80.
A 2mhz 6502 executes a 1byte (say INX) instuction in 2 machine cycles and
that takes 1uS.
Yes, but it executes two of them in three machine cycles, and three in four,
etc. due to the pipelining which allows overlapped execution and opcode
fetch.
A 4mhz INC B (any register) takes 4 z80 clocks at
4mhz... damm if that
doesn't happen to be 1uS! Where is the speed difference?
Yes, but if you execute two of them, it takes 8, and if you execute three it
takes 12. That's 3 microseconds. Now, a 2 MHz 6502 takes only 2
microseconds to increment a register twice.
According to my book a 6502 absolute jump takes 3-5
cycles and in the 5
cycle case its 2.5 us.
Not quite . . . the 5 tick jump is an indirect jump, opcode 0x6C. An
absolute jump, opcode=0x4C takes only and exactly three clock ticks.
<probably measure three microseconds for those
twelve clock ticks (T-states
<which is EXACTLY how long a 1 MHz 6502 takes to do that. Hence, I conclud
Exactly my point. The 6502 is not faster, it only marches to a different
drummer.
Yes, but if you had carefully read the quoted statement from my previous
email, you'd have noticed I referenced it to a 1 MHz processor, not a 2 MHz
one as was used in the comparison. So you see, for the sake of this
comparison, the drummer is beating twice as quickly.
<I've concluded that most code I've seen
underutilizes the internal
resource
<and overutilizes the external ones. Code like that
favors processors with
<more time-efficient use of the external resources. Hence, my assertion
that
<there's reason to believe the 6502 at 2 MHz
could outrun the 4 MHz Z-80 in
<more or less typical code and in a more or less typical hardware
No again. It can match the z80 and in some cases it's better or worse.
Well, I'd be very pleased to see a block of code written to accomplish any
useful (or otherwise definable) task in less time on a 4 MHz Z-80 than it
would take a 2 MHz 6502 to achieve the same end. I'm not saying I can write
better code than you, nor am I even saying it can't be done. I've never
seen it, though. It's virtually impossible to do this against a 1 MHz 6502,
and you're allowing me to continue the comparison between a 4MHz Z-80A and a
2 MHz 6502A, right? The immediately following statement which you quoted
from my previous email states my view on this. I'd
still be interested, if
only as point of curiosity.
<environment. Code written to make better than average utilization of the
<internals of a Z-80 might fare better against equally well-written code on
<6502. I'm comfortable with the reality that I'll probably never know for
<certain. Since neither processor is particularly important these days, no
<terribly important to me either.
Agreed well written code is essential for either to do useful work.
<None of this is really worth getting all excited about because, by the way
<in spite of its "better" performance, (by my assessment) the 6502
didn't
<accomplish more useful work on MY behalf, because I used a Z-80 running
CP/
<every chance I got due to the abundance of really
decent tools and office
<automation software.
Therein lies the key. A good system is not always defined by it's
hardware. Systems are a combination of practical hardware and functional
software. This account for why despite their flaws the TRS80, Apple II,
Z80 CPM based as well as others florished. Most people didn't program
8080z806502ti990018028085680980886800065815 they ran basic or a word
processor. the run on of part numbers was deliberate as to most people
the cpu used was just a number.
Quite so, and I'd still be waiting today to get my old 6502 to run WordStar
under CP/M . . .
Allison