Reproduction micros
Peter Corlett
abuse at cabal.org.uk
Thu Jul 21 15:22:37 CDT 2016
On Wed, Jul 20, 2016 at 09:02:41PM +0200, Liam Proven wrote:
> On 19 July 2016 at 17:04, Peter Corlett <abuse at cabal.org.uk> wrote:
[...]
>> RISC implies a load-store architecture, so that claim is redundant.
> Could you expand on that, please? I think that IKWYM but I'm not sure.
A load-store architecture is one where the ALU only operates on registers. The
name comes from having separate instructions to load registers from memory, and
store them to memory.
The converse is register-memory, where ALU instructions can work directly on
memory. However, this means that the instructions have to do quite a lot of
work because now data has to be brought in from memory to an anonymous register
to be worked on and then stored back to the same location. This also results in
a proliferation of instruction and addressing mode combinations. Sounds rather
CISCy, doesn't it?
Meanwhile, a load-store architecture would have to decompose that into simpler
independent load, operate, store instructions. Hey presto, RISC!
[...]
>> IMO, it's the predicated instructions that is ARM's special sauce and the
>> real innovation that gives it a performance boost. Without those, it'd be
>> just a 32 bit wide 6502 knockoff.
> Do tell...?
You've already answered the "6502 knockoff" elsethread, so I assume you're
asking about the predicated instructions.
A predicated instruction is one that does or does not execute based on some
condition. CISC machines generally use condition codes (aka flags), and only
have predicated branch instructions. Branch-not-equal, that kind of things.
In ARM, *all* instructions can be predicated. Because instructions are 32 bits
wide, it has the luxury of allocating four bits to select from one of 16
possible predicates based on the CPU flags. One predicate is "always" so one
can also unconditionally execute instructions.
An occasionally forgotten feature is that ALU operations also have a S-bit to
indicate whether they should update the flags based on the result, or leave
them alone.
Between these, a conditional branch over a handful of instructions can be
replaced by making those instructions predicated, and the S bit set to not
update the flags. Not only has the conditional branch been deleted completely
from the instruction stream which makes code noticably more compact, but
there's now no branch-induced pipeline stall. Specila sauce.
Unsurprisingly, x86 eventually noticed this sort of thing is useful and pinched
the idea, but did it in the usual half-arsed fashion that it is famous for.
More information about the cctech
mailing list