C compilers and non-ASCII systems

31 Jan 2012

On 31 Jan 2012 at 21:59, Toby Thain wrote:

...
  This is essentially how Professor Knuth achieved
portability to 
 non-ASCII systems for TeX, METAFONT and his other tools. 
Essentially, the rule is "Don't do arithmetic on characters".

But I've had a lot of questions about what the specs actually mean.

If the "smallest addressable unit" in C of being type char apparently 
doesn't mean that the machine has to be char-addressable.  

For example, a machine with 128-bit words, and only addressable by 
word addresses doesn't need to have type char as 128 bits, only that 
the compiler and run-time need to make provision for some means of 
addressing chars, even if that means a separate system of addressing; 
e.g. "C" addresses are machine addresses shifted by 4 bits.  

I suppose it's even possible to create a C where word addresses == 
char addresses; the char being aligned in a word, one char per word, 
with the remainder of the word unsued.

So does the difference between to void* pointers necessarily equate 
to a count of chars between those addresses?  Take the case of one 
char per word above, for example.

Do char and int addresses have to share the same space?  Or can chars 
and ints enjoy separate addressing spaces?  Do addressing spaces need 
to be compatible?  (I think about low-end PIC 8-bit and AVR where 
data stored in code space as constants don't have the same 
granularity.  

--Chuck

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

C compilers and non-ASCII systems