Typing in lost code - test-drb@ccmp.vtda.org - classiccmp.org

List overview All Threads
Download

Typing in lost code

Question about DECtape formulation

PDP-8 clone built by Canadian...

jnc＠mercury.lcs.mit.edu

23 Jan 2022 23 Jan '22

12:31 p.m.

From: Gavin Scott

I think if I had a whole lot of old faded greenbar etc. ... Someone may even have done this already

See: https://walden-family.com/impcode/imp-code.pdf Someone's already done the specialist OCR to deal with faded program listings. Noel

Reply

Show replies by date

gavin＠learn.bio

23 Jan 23 Jan

1:04 p.m.

On Sun, Jan 23, 2022 at 11:31 AM Noel Chiappa via cctalk <cctalk at classiccmp.org> wrote:

See: https://walden-family.com/impcode/imp-code.pdf Someone's already done the specialist OCR to deal with faded program listings.

Neat. Though all the complex character recognition part of that work is now like 15-20 lines of Python code (using either Keras or PyTorch).

Reply

dkelvey＠hotmail.com

1:06 p.m.

It is unlikely that no current day OCR will produce an error free listing. It is possible to train an AI to do this but it requires specific training. It must be on the specific machine code and on the same format. Any generic OCR will have many errors if the text is hard to read. The final product must include notes as to things it is not sure about or it would be useless. I recovered a listing for the 4004 processor that was printed on a ASR33 with ruts on the platen. The right hand 1/4 of letters were missing at several locations across the page. Letters such as F and P, as well as 0 and C were often not well enough printed to distinguish. Luckily F and P were often in context relatively easy to determine but 0 and C were often use to describe a HEX number. Unlike the text on this page, the differences were not always obvious. The final result in working code required noting which things were possibly one or the other. The only way to determine most of these was by using a simulation of the code. Most all the cases for the 0 vrs C were that it was a 0, as these were for initializing a pointer base number ( context of usage ). In one case it was only through the simulation was I able to determine that it was really CC and not 00. Marking locations of uncertainty was essential to determine where to check the program code context. Any OCR that doesn't include possible options and that isn't trained on that particular code is worthless. Dwight ________________________________ From: cctalk <cctalk-bounces at classiccmp.org> on behalf of Noel Chiappa via cctalk <cctalk at classiccmp.org> Sent: Sunday, January 23, 2022 9:31 AM To: cctalk at classiccmp.org <cctalk at classiccmp.org> Cc: jnc at mercury.lcs.mit.edu <jnc at mercury.lcs.mit.edu> Subject: Re: Typing in lost code

From: Gavin Scott

I think if I had a whole lot of old faded greenbar etc. ... Someone may even have done this already

See: https://walden-family.com/impcode/imp-code.pdf Someone's already done the specialist OCR to deal with faded program listings. Noel

Reply

dkelvey＠hotmail.com

11:47 p.m.

Sorry about the double negative. I was in a hurry as I was supposed to drive over the hill to Santa Cruz for a couple hours. "It is unlikely that no current day OCR will produce an error free listing." Should have read: "It is unlikely that any current day OCR will produce an error free listing." I agree with Chuck. A computer code listing cannot tolerate a single mistake in a number. I recall recovering data from cassette tapes were the tape stuck to the capstan and got folds. Most of the code was in BASIC so had quite a bit of redundancy for the program flow. Luckily, there were few damaged segments with numeric values. The tapes had check sums that helped quite a bit. It is not so in typed listings. As I stated, the code I recovered for the 4004 code would have been lost if I'd not understood the purpose and run the simulation of the code, stopping to see what alternate values did to the execution of the code. It was over 3K of code. Quite a bit for a 4004. It was intended to be loaded into 13 1702A Eproms. There were over 30 points in the code that needed to be resolved. Dwight ________________________________ From: cctalk <cctalk-bounces at classiccmp.org> on behalf of dwight via cctalk <cctalk at classiccmp.org> Sent: Sunday, January 23, 2022 10:06 AM To: cctalk at classiccmp.org <cctalk at classiccmp.org> Subject: Re: Typing in lost code It is unlikely that no current day OCR will produce an error free listing. It is possible to train an AI to do this but it requires specific training. It must be on the specific machine code and on the same format. Any generic OCR will have many errors if the text is hard to read. The final product must include notes as to things it is not sure about or it would be useless. I recovered a listing for the 4004 processor that was printed on a ASR33 with ruts on the platen. The right hand 1/4 of letters were missing at several locations across the page. Letters such as F and P, as well as 0 and C were often not well enough printed to distinguish. Luckily F and P were often in context relatively easy to determine but 0 and C were often use to describe a HEX number. Unlike the text on this page, the differences were not always obvious. The final result in working code required noting which things were possibly one or the other. The only way to determine most of these was by using a simulation of the code. Most all the cases for the 0 vrs C were that it was a 0, as these were for initializing a pointer base number ( context of usage ). In one case it was only through the simulation was I able to determine that it was really CC and not 00. Marking locations of uncertainty was essential to determine where to check the program code context. Any OCR that doesn't include possible options and that isn't trained on that particular code is worthless. Dwight ________________________________ From: cctalk <cctalk-bounces at classiccmp.org> on behalf of Noel Chiappa via cctalk <cctalk at classiccmp.org> Sent: Sunday, January 23, 2022 9:31 AM To: cctalk at classiccmp.org <cctalk at classiccmp.org> Cc: jnc at mercury.lcs.mit.edu <jnc at mercury.lcs.mit.edu> Subject: Re: Typing in lost code

From: Gavin Scott

I think if I had a whole lot of old faded greenbar etc. ... Someone may even have done this already

See: https://walden-family.com/impcode/imp-code.pdf Someone's already done the specialist OCR to deal with faded program listings. Noel

Reply

lars＠nocrew.org

1:48 p.m.

Noel Chiappa wrote:

https://walden-family.com/impcode/imp-code.pdf Someone's already done the specialist OCR to deal with faded program listings.

I tried to contact the author about converting some of the other IMP listings, but got no reply.

Reply

1286

days inactive

1287

days old

test-drb@ccmp.vtda.org

Manage subscription

4 comments

4 participants

tags (0)

participants (4)

dkelvey＠hotmail.com
gavin＠learn.bio
jnc＠mercury.lcs.mit.edu
lars＠nocrew.org