BASIC (Was: Reading HP2000 tapes

Paul Koning paulkoning at comcast.net
Wed Jul 18 12:47:15 CDT 2018



> On Jul 18, 2018, at 1:21 PM, Paul Berger via cctalk <cctalk at classiccmp.org> wrote:
> 
> I would think that any interpreted BASIC would do this or for that matter any interpreted language except maybe for APL which is pretty much written with tokens anyway.  One other exception I can think of is perl  which is stored as source text.  Saving in tokenized form was good for to reasons, it saved storage space, both in memory and on mass storage and when you loaded the program it was ready to go.
> 
> Paul...
>> 
>>>>> I think it was called a "decompiler" though.  Seemed like magic at the time.
>>>>> 
>>>>> Googling reveals "You may be remembering the BASIC PLUS
>>>>> decompiler under RSTS.  RSTS BASIC PLUS was interpreted from "push-pop" code.
>>>>> The symbol table was available in the compiled file, and the correspondence
>>>>> between push-pop operations and BASIC PLUS source was very close, so you
>>>>> could get back very reasonable code."
>>>>> 
>>>>> And our previous discussion of it a decade ago:
>>>>> 
>>>>> https://marc.info/?l=classiccmp&m=121804804023540&w=2

I would not say "written with tokens".

Basic-PLUS essentially used a stack machine code, easy to generate and pretty efficient.  It wasn't designed to be reversible, but since the symbol table was saved as well (had to be, to allow for incremental editing and interactive debugging) you could reverse pretty easily.  This sort of thing has a long history.  UCSD Pascal used something similar, which it called "P-code".  The TUTOR language of the U of Illinois PLATO system did as well, except for expressions which were compiled into actual machine code.  That sort of mixed encoding was used a decade earlier in the first ALGOL compiler, by Dijkstra and Zonneveld, 1961, for the EL-X1.  And yes, it's still done a lot, I believe Python is a good example.

A somewhat different approach is found in RT-11 BASIC, a somewhat simpler language than BASIC-PLUS and an unrelated implementation.  That one does convert the text into tokens, it doesn't generate a stack language transformation as B+ did.  And the token encoding is explicitly designed to be reversible: when you use the LIST command the token stream is converted back to source text.  That means, for example, that comments are included in the token stream (unlike B+).

	paul



More information about the cctech mailing list