Consider a machine with a word length of 64 bits.
This machine
represents floating point numbers with a 16 bit exponent and a 48 bit
mantissa (nothing unusual so far).
Okay.
So we have the length of short = int = long = 64 bits.
So far so
good.
Well, we _can_ have short/int/long all having 64 bits. Even char, too,
I think.
However, given such a generous word length, the
designers of this
machine decide not to dedicate special hardware toward handling 64
bit integers, but have said that 48 bits should be long enough for
anyone, and so treat integer arithmetic as a subset of floating point
(straightforward enough). So not all values of a 64 bit word
reflect valid integers--the exponent must be a certain
value--anything else is floating point.
Well, this makes it impossible to implement long long, which (loosely
put) must have at least 64 bits of range. But let's pretend that long
long doesn't exist, or perhaps that all the bit counts you give are
doubled, or some such.
Now, here's where we get into sticky territory.
Since C draws no
data type distinctions between bitwise logical operations on ints and
arithmetic operations, is it possible to implement C on this machine?
I'm not sure. The C99 draft I have says (6.2.6.1)
[#5] Certain object representations need not represent a
value of the object type. If the stored value of an object
has such a representation and is read by an lvalue
expression that does not have character type, the behavior
is undefined. If such a representation is produced by a
side effect that modifies all or any part of the object by
an lvalue expression that does not have character type, the
behavior is undefined.41) Such a representation is called a
trap representation.
Footnote 41 says
41)Thus, an automatic variable can be initialized to a trap
representation without causing undefined behavior, but
the value of the variable cannot be used until a proper
value is stored in it.
The boolean operations on unsigned integer types would have to be
implemented in a way that's careful to avoid messing up the magic
exponent-field value (they couldn't just be 64-bit boolean operations,
in general).
The wording about "character type" above appears to be intended to
support idioms like
for (i=0;i<sizeof(thing);i++)
((char *)&thing2)[i] = ((char *)&thing1)[i];
However, there are constraints on how integers are represented,
specifically on how signed and unsigned integers are related, that
might break the above. 6.2.6.2 (remember the missing supserscripting):
[#1] For unsigned integer types other than unsigned char,
the bits of the object representation shall be divided into
two groups: value bits and padding bits (there need not be
any of the latter). If there are N value bits, each bit
shall represent a different power of 2 between 1 and 2N-1,
so that objects of that type shall be capable of
representing values from 0 to 2N-1 using a pure binary
representation; this shall be known as the value
representation. The values of any padding bits are
unspecified.44)
44)Some combinations of padding bits might generate trap
representations, for example, if one padding bit is a
parity bit. Regardless, no arithmetic operation on valid
values can generate a trap representation other than as
part of an exceptional condition such as an overflow, and
this cannot occur with unsigned types. All other
combinations of padding bits are alternative object
representations of the value specified by the value bits.
[#2] For signed integer types, the bits of the object
representation shall be divided into three groups: value
bits, padding bits, and the sign bit. There need not be any
padding bits; there shall be exactly one sign bit. Each bit
that is a value bit shall have the same value as the same
bit in the object representation of the corresponding
unsigned type (if there are M value bits in the signed type
and N in the unsigned type, then M<=N). If the sign bit is
zero, it shall not affect the resulting value. If the sign
bit is one, the value shall be modified in one of the
following ways:
-- the corresponding value with sign bit 0 is negated
(sign and magnitude);
-- the sign bit has the value -(2N) (two's complement);
-- the sign bit has the value -(2N-1) (one's complement).
Which of these applies is implementation-defined, as is
whether the value with sign bit 1 and all value bits zero
(for the first two), or with sign bit and all value bits 1
(for one's complement), is a trap representation or a normal
value. In the case of sign and magnitude and one's
complement, if this representation is a normal value it is
called a negative zero.
...
[#5] The values of any padding bits are unspecified.45) A
valid (non-trap) object representation of a signed integer
type where the sign bit is zero is a valid object
representation of the corresponding unsigned type, and shall
represent the same value.
45)[...basically a repeat of footnote 44, above...]
I think this permits what you sketch, but your sketch is brief enough
I'm not entirely sure. It's also possible I've missed a constraint
somewhere else that's relevant.
I'll ask my go-to C guy about this.
As far as "bytes" on systems, perhaps the
attribute of byte
addressability makes sense on short word-length machines, but I don't
believe that it's necessary for longer word length machines. [...]
I think byte addressing is more a matter of functional fixedness more
than anything else.
FSVO "byte", I agree. However, since we've been discussing C, there's
a _lot_ of (C) code that assumes that char * has the same size and
representation as other pointer types, even though there's no
justification in the C spec for such an assumption. (POSIX, on the
other hand, may impose such a restriction; I don't have even a draft of
POSIX handy to check.)
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse at
rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B