Early Programming Books

Mon Jun 21 03:35:07 CDT 2021

On Sun, Jun 20, 2021 at 08:06:26PM -0600, ben via cctalk wrote:
[...]
> My latest gripe, is I still am looking for a algorithm to generate code
> for a single accumulator machine for an arithmetic expression. Parenthesis
> need to evaluated first and temporary variables allotted, thus a two pass
> algorithm. Everything is single pass. Recursive decent can parse but can't
> generate 'correct' code. A-(B+C) is LD B ADD C ST T1 LD A SUB T1, not LD A
> ST T1 LD B ST T2 LD C ADD T2 NEGATE ADD T1

TL;DR: you're probably setting so many implicit design constrants that such
an algorithm cannot exist.

That accumulator machine sounds a bit like the 6502, for which there are
plenty of BASIC interpreters which can parse and evaluate that expression.
BBC BASIC is my favourite, but Apple's or Microsoft's implementation also do
the job. The ones I've looked at were recursive-descent. Other more advanced
parsers take too much precious memory.

The modern hotness for parsing is parser combinators, but they're really
just a fancy way to write recursive-descent parsers which are easier to read
and reason about. They certainly can parse something as simple as "A-(B+C)",
since recursive-descent can.

Code-generation is a whole different can of worms and unlike the
well-trodden path of parsing, is still not a solved problem in computer
science. All of the compiler textbooks I've checked, including the infamous
Dragon book, handwave code generation, introducing the tree-matching
algorithm which works well for straight-line code on their made-up
educational CPU architectures, but leaves a lot on the table with real-world
CPUs. This limitation probably doesn't matter for your CPU.

It seems that you might want to produce code directly from parsing. This
shortcut can work if you're producing code for a stack machine such as
FORTH, but not a register machine (unless you use your registers as a
stack). A common wheeze on the 6502 is to implement a virtual stack machine
interpreter, and then the compiler targets that.

To do code-generation "properly", you need a multi-pass algorithm: parse the
source into a tree, optionally do tree-to-tree transformations to do type
checks and/or apply optimisations, and then apply the tree-matching
algorithm to that to produce machine code. This is all well-described in any
good compiler textbook.

Note that except for trivial expressions, you'll need a stack and/or other
registers to squirrel away intemediate results. Consider "(A*B)-(C*D)": you
need to save the result of one of the multiplications before doing the other
and then the subtraction.

In the case of the tree-matching algorithm, if your CPU (or rather, the
model given to the tree-matcher) is too limited to evaluate a given
expression, it will definitively fail and give you a subtree that it cannot
match. You then have the "joy" of fixing the model, possibly by adding
virtual instructions which are really library calls. For a simple CPU, such
calls might account for most of the generated code, at which point a virtual
machine becomes a good idea.

The book "Writing interactive compilers and interpreters" may be more
suitable if you're looking for quick hacks to get up and running on a small
machine, assuming you can lay your hands on a copy.