Greetings Restorers,
I think a number of us have wanted to restore software that's only
available as a scanned listing from a line printer. The original
printout probably wasn't the best typographic quality, and scanning
doesn't improve it.
As a first pass, OCR with tools like Adobe Acrobat can easily produce
a rough draft of the content in text form, but it takes almost as much
work to correct the many "typos" as it does to simply re-type the listing.
It seems like, with all this high-tech AI processing around, it
should be possible to take advantage of the limited character set, fixed
fonts, and restricted grammar that one might find in a listing to
resolve more of the ambiguities in character recognition.
Does anyone have an approach that's more efficient than generic OCR
and a long process of correcting typos on every line of code or comment?
Thanks
/guy