C & undefined behaviour - was Re: tumble under BSD

Mouse mouse at Rodents-Montreal.ORG
Sun Apr 3 14:53:59 CDT 2016


>> Indeed, intel segmented memory model was weird.  [...]
>> Far pointers were insanity-inducing, though.  Since there were
>> multiple ways to represent the same address as a far pointer, [...]
>> Thankfully, huge pointers behaved exactly as one would expect, [...]

> There we have the issue.  Often when people speak of what they "expect" you'$

(Please don't use paragraph-length lines.)  Yes and no.  A lot of C is
undefined, or implementation-defined, to allow different disagreeing
implementations to coexist.  Someone writing under and for one
particular implementation may indeed reasonably expect certain
behaviour that the standard does not promise - for example, if I'm
writing for a SS-20, I consider it reasonable to expect ints to be
32-bit two's-complement, even though C qua C does not promise either
part of that, and, while a new compiler may in principle break either
part or both, it would have to violate the "int is the `natural'
integer type for the architecture" principle to do so.

Another part of this is that C originated as, and is still used as, an
OS implementation language.  In such use, it is not unreasonable to
treat it as the "high-level assembly language" some people have called
it - apparently intending it to be a criticism while not understanding
that, in some senses, that's what C is _supposed_ to be.  And, from
that point of view, compilers that take advantage of formally-undefined
behaviour to optimize things as sketched here and in Mr. Regehr's
writings are not clever; they are broken.

I don't think either position is unreasonable, either.  Which I suppose
really means that I think there are places both for compilers that act
like high-level assemblers, doing the unsurprising thing from the POV
of someone familiar with the architecture being compiled for, and for
compilers that take advantage of all the liberty the language spec
allows to optimize the hell out of the code.

I'm not sure there is any fix for the problems arising when people try
to satisfy both desires with the same compiler (or the same set of
configuration switches to a single compiler, or some such).  It's
basically the "is this language right for this task?" problem in
slightly different dress.

> When a better compiler (with more powerful optimization) breaks the
> program, the compiler is blamed rather than the programmer who made
> the incorrect assumption.

Or, to see it from the "high-level assembly" position, when a less
appropriate compiler (with more aggressive optimization) is used, it
is, correctly, blamed (for not being apporpriate to the task at hand).

> Ideally compilers would flag all undefined programs, but in practice they do$

It's not possible in general, because sometimes the undefined behaviour
depends on something not known until run time.  Consider

	int v;
	scanf("%d",&v);
	printf("%d",v+1);

This is perfectly well-defined - until and unless someone feeds it (a
suitable textual representation of) INT_MAX.  There might be a place
for a compiler that flagged every instance of undefined behaviour, even
if it means otherwise unnecessary run-time costs, but for most purposes
that would be a Bad Thing.  (I've often contemplated building a
`checkout' compiler that deliberately went out of its way to break
various assumptions people tend to make that aren't promised, things
like "all pointers are really just memory addresses, with pointer casts
being no-ops" or "all signed arithmetic is two's-complement" or "the
stack grows down" or "pointers into different objects are comparable"
or "shims are inserted into structs only when necessary to avoid
placing objects at unusual alignments" or "there are no padding bits in
integer representations" or "nil pointers are all-bits-zero"....)

> This paper https://pdos.csail.mit.edu/papers/ub:apsys12.pdf is an
> excellent survey.

A pity pdos.csail.mit.edu is willing to impair its accessibility for
the sake of..I'm not sure what..by refusing to serve it over HTTP.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse at rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


More information about the cctalk mailing list