BASIC (Was: Reading HP2000 tapes

18 Jul 2018

...
  On Jul 18, 2018, at 1:21 PM, Paul Berger via cctalk
<cctalk at classiccmp.org> wrote:

 I would think that any interpreted BASIC would do this or for that matter any interpreted
language except maybe for APL which is pretty much written with tokens anyway.  One other
exception I can think of is perl  which is stored as source text.  Saving in tokenized
form was good for to reasons, it saved storage space, both in memory and on mass storage
and when you loaded the program it was ready to go.

 Paul...
> 
>>>> I think it was called a "decompiler" though.  Seemed like magic
at the time.
>>>> 
>>>> Googling reveals "You may be remembering the BASIC PLUS
>>>> decompiler under RSTS.  RSTS BASIC PLUS was interpreted from
"push-pop" code.
>>>> The symbol table was available in the compiled file, and the
correspondence
>>>> between push-pop operations and BASIC PLUS source was very close, so you
>>>> could get back very reasonable code."
>>>> 
>>>> And our previous discussion of it a decade ago:
>>>> 
>>>> https://marc.info/?l=classiccmp&m=121804804023540&w=2 
I would not say "written with tokens".

Basic-PLUS essentially used a stack machine code, easy to generate and pretty efficient. 
It wasn't designed to be reversible, but since the symbol table was saved as well (had
to be, to allow for incremental editing and interactive debugging) you could reverse
pretty easily.  This sort of thing has a long history.  UCSD Pascal used something
similar, which it called "P-code".  The TUTOR language of the U of Illinois
PLATO system did as well, except for expressions which were compiled into actual machine
code.  That sort of mixed encoding was used a decade earlier in the first ALGOL compiler,
by Dijkstra and Zonneveld, 1961, for the EL-X1.  And yes, it's still done a lot, I
believe Python is a good example.

A somewhat different approach is found in RT-11 BASIC, a somewhat simpler language than
BASIC-PLUS and an unrelated implementation.  That one does convert the text into tokens,
it doesn't generate a stack language transformation as B+ did.  And the token encoding
is explicitly designed to be reversible: when you use the LIST command the token stream is
converted back to source text.  That means, for example, that comments are included in the
token stream (unlike B+).

	paul

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

BASIC (Was: Reading HP2000 tapes