[cctalk] Re: A little off-topic but at least somewhat related: endianness

17 Aug 2024

On Fri, Aug 16, 2024 at 04:00:40PM -0600, ben via cctalk wrote:
...
  On 2024-08-16 8:56 a.m., Peter Corlett via cctalk
wrote: [...]
...
 > This makes them a perfect match for a brain-dead
language. But what does
> it even *mean* to "automaticaly promote smaller data types to larger
> ones"? That's a rhetorical question, because your answer will probably
> disagree with what the C standard actually says :) 
...
  I have yet to read a standard, I can never find, or
afford the
 documentation. 
Google "N3096.PDF". You're welcome.

[...]
...
  Now I need to get a cross assembler and c compiler for
the 68K. 
The GNU tools work fine for me for cross-compiling to bare-metal m68k. I use
it on Unix systems, but you can probably get it working on Windows if you
must. I even maintain a set of patches for GCC to do register parameters,
although unless you specifically need that functionality, upstream GCC is
just fine.

[...]
...
 > Now, what kind of badly-written code and/or
braindead programming
> language would go out of its way to be inefficient and use 32-bit
> arithmetic instead of the native register width? 
...
  The problem is the native register width keeps
changing with every cpu. C
 was quick and dirty language for the PDP 11, with 16 bit ints. They never
 planned UNIX or C or Hardware would change like it did, so one gets a
 patched version of C. That reminds me I use gets and have to get a older
 version of C. 
They'd have had to be fairly blinkered to not notice the S/360 series which
had been around for years before the PDP-11 came out. It doesn't take a
particularly large crystal ball to realise that computers got smaller and
cheaper over time and features from larger machines such as wider registers
would filter down into minicomputers and microcomputers.

But C also seems to ignore a lot of the stuff we already knew in the 1960s
about how to design languages to avoid programmers making various common
mistakes, so those were quite large blinkers. They've never been taken off
either: when Rob and Ken went to work for Google they came up with a "new"
C-like language which makes many of the same mistakes, plus some new ones,
and it is also more bloated and can't even be used to write bare-metal stuff
which is one of the few things one might reasonably need C for in the first
place.

[...]
...
   It's not
just modern hardware which is a poor fit for C: classic hardware
 is too. Because of a lot of architectural assumptions in the C model, it
 is hard to generate efficient code for the 6502 or Z80, for example.  or any PDP
not 10 or 11. 
...
  I heard that AT&T had C cpu but it turned out to
be a flop. C main
 advantage, was a stack for local varables and return addresses and none of
 the complex subroutine nesting of ALGOL or PASCAL. 
That'd be the AT&T Hobbit, "optimized for running code compiled from the C
programming language". It's basically an early RISC design which spent too
much time in development and the released product was buggy, slow,
expensive, and had some unconventional design decisions which would have
scared off potential users. Had it come out earlier it may have had a
chance, but then ARM came along and that was that.

Complex subroutine nesting can be done just fine on a CPU "optimised for"
running C. For example, you can synthesise an anonymous structure to hold
pointers to or copies of the outer variables used in the inner function, and
have the inner function take that as its first parameter. This is perfectly
doable in C itself, but nobody would bother because it's a lot of
error-prone boilerplate. But if the compiler does it automatically, it
suddenly opens up a lot more design options which result in cleaner code.

...
 > But please, feel free to tell me how C is just
fine and it's the CPUs
> which are at fault, even those which are heavily-optimised to run typical
> C code. 
...
  A computer system, CPU , memory, IO , video & mice
all have to share the
 same pie. If you want one thing to go faster, something else must go
 slower. C's model is random access main memory for simple variables and
 array data. Register was for a simple pointer or data. Caches may seem to
 speed things up, but they can't handle random data
 (REAL(I+3,J+3)+REAL(I-3,J-3)+REAL(I+3,J-3)+REAL(I-3,J+3)/4.0)+REAL(I,J) 
A "computer system" today is a network of multiple CPUs running
autonomously, and are merely co-ordinated by the main CPU. Adding a faster
disk to my system does not per se make the main CPU slower, although of
course the improved performance means that the CPU load may go up purely
because it is not being held back on I/O and can achieve higher throughput.
Graphics cards are a very extreme form of moving things off the main CPU.
They are special-purpose parallel CPUs which can number-crunch certain
problems (such as making Tamriel beautiful) many orders of magnitude faster
than the main CPU.

And did I write "main CPU"? A typical PC today has four "main" CPUs on
one
die. There's even a network between those CPUs, and each CPU really does see
the others as mere I/O devices onto which they can offload work.

This is very much not the zero-sum game you imply.

...
  I will stick to a REAL PDP-8. I know a TAD takes 1.5
us, not 1.7 us 70% of
 the time and 1.4 us the other 30%. Real time OS's and CPU's are out there,
 how else would my toaster know when to burn my toast. 
While general-purpose high-performance CPUs have rather uneven performance,
the sheer brute force overcomes a lot of latency concerns. If I can perform
a peak of 50 additions per nanosecond, but there are occasional
microsecond-long stalls bringing the performance down to 90% of that, that's
still more than good enough for general-purpose computing.

If I need hard realtime performance with sub-microsecond accuracy, I've got
a box of microcontrollers. A simple RISC CPU with a two-stage pipeline and
zero wait state SRAM rather than a CPU cache, meaning all instructions have
predictable timing, generally 1 or 2 cycles. Oh, and it is rated to go at up
to 133MHz and costs €3 each in the retail version packaged up nicely on a
PCB resembling a DIP40 chip and ready to plug into a breadboard, or $1 each
if I'm buying the bare microcontrollers wholesale.

In fact it contains *two* CPUs, although since they share the same RAM banks
this can introduce contention which will affect execution time. There are
two fixes for that problem: program it carefully such that contention does
not happen, or just spend another dollar and add a second microcontroller.

It can emulate that PDP-8 just fine, with exactly the same instruction
timing, mostly because it can execute 100-200 instructions in 1.5
microseconds, so it'll spend much of its time twiddling its thumbs waiting
for the next emulated clock tick. Twiddling its Thumb-2, in fact, so you
have probably already worked out which microcontroller I'm referring to.

...
  Only knowing the over all structure, of a program and
hardware can one
 optimize it. 
That's venturing into the "premature optimisation" which Knuth warns about.

In anything but the most trivial of systems, there are enough unknown
unknowns that pontificating about what should be slow does not reliably
deliver the correct answer, and just running the code and measuring it to
see which parts are taking too long is a much more productive use of one's
time.

2025

2024

2023

2022

[cctalk] Re: A little off-topic but at least somewhat related: endianness