> I still wonder about the utilitiy of the
"count the 1's"
> instruction.
It's one of those "if you need it not much else will do but if you
don't you generally can't imagine what it's good for" things. It's
possible to generate bitcount with lg(N) mask-mask-shift-add cycles,
where N is the number of bits in a word, but that's slow compared to a
dedicated count-bits instruction.
The most plausible use for it, offhand, seems to me to be when working
with error-correcting codes; XOR + bitcount gives you the Hamming
distance between two codewords quickly.
According to a talk I saw on the web a few years ago
now, by one of
the guys in charge of P3 (or P4), people still ask "can I have
such-and-such" an instruction!
The nerve of them! Expecting what people want to actually matter to
the processor designers!
My own preferred instruction would be what I've thought of as "ALU".
It would take 6 (32-bit) or 7 (64-bit) input operands and one output
operand; one of the input operands gives the truth table for a boolean
function which is applied bitwise across the other inputs.
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML mouse at rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B