C compilers and non-ASCII systems

31 Jan 2012

...
  Must C language elements be represented by binary
numbers? 
...
  I've read and heard that it's impossible, but
is it really? 
Yes and no.

They must be implemented in a way that looks like binary as far as the
semantics visible to the abstract C machine are concerned.  Whether the
actual underlying storage mechanism is two-state or not is beyond the
scope of the standard.  (This is sometimes called the "as if" rule:
everything else can be thrown out the window as long as the compiler
and run-time (if applicable) arrange that it look as if the spec were
being followed as far as C code can tell.)

So, you can use decimal if you want, but you'll have to go through
gyrations to make things like & and << and + operate _as if_ everything
were actually binary.  (This would probably be difficult, wasteful of
storage, or both, especially considering the requirement that every
object be viewable as an array of chars without loss of information.
It would not be impossible.)

6.2.6.1

       [#3]  Values  stored  in  unsigned bit-fields and objects of
       type unsigned char shall be represented using a pure  binary
       notation.

       [#4]  Values  stored  in  non-bit-field objects of any other
       object type consist of n?CHAR_BIT bits, where n is the  size
       of  an  object  of  that  type,  in bytes.  ...

...
  Can a ones-complement machine conform to the C
standard? 
Yes.  There are three allowed representations for signed integral
types: two's complement, one's complement, and sign/magnitude.

6.2.6.2

       [#2]  For  signed  integer  types,  the  bits  of the object
       representation shall be divided  into  three  groups:  value
       bits, padding bits, and the sign bit.  There need not be any
       padding bits; there shall be exactly one sign bit.  ...
                                                   ...  If the sign
       bit is one, the value  shall  be  modified  in  one  of  the
       following ways:

         -- the  corresponding  value  with  sign  bit 0 is negated
            (sign and magnitude);

         -- the sign bit has the value -(2N) (two's complement);

         -- the sign bit has the value -(2N-1) (one's complement).

       Which of these  applies  is  implementation-defined,  as  is
       whether  the  value  with sign bit 1 and all value bits zero
       (for the first two), or with sign bit and all value  bits  1
       (for one's complement), is a trap representation or a normal
       value.   In  the  case  of  sign  and  magnitude  and  one's
       complement,  if  this representation is a normal value it is
       called a negative zero.

A "trap representation" is an arrangement of bits that does not
represent a value of the relevant type, and is not necessarily operable
on even to the extent of storing it:

6.2.6.1

       [#5] Certain object representations  need  not  represent  a
       value  of the object type.  If the stored value of an object
       has  such  a  representation  and  is  read  by  an   lvalue
       expression  that  does not have character type, the behavior
       is undefined.  If such a representation  is  produced  by  a
       side  effect  that modifies all or any part of the object by
       an lvalue expression that does not have character type,  the
       behavior is undefined.     Such a representation is called a
       trap representation.

For example, on that decimal machine, a C implementation may choose to
use only values 0..7 of each digit, to store three bits of information.
Any value with an 8 or 9 in it could then be a trap representation.  (A
trap representation does not have to actually trap; undefined behaviour
can include doing something sensible to someone who knows what's really
going on under the hood.)  I *think* this would conform; I'll ask
someone I know who knows this stuff better than I.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse at rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

C compilers and non-ASCII systems