Josh Dersch wrote:
I was considering it, but it's about 950 pages
which makes it a
rather daunting task.
I typed in around that many pages of listings on two different
occasions, once for the HP-41C mainframe ROM source code, and once for
HP 2000C Time-Shared BASIC. Each project took a few weeks of spare
time. However, I don't have spare time any more, so I'm not likely to
do this for the Tek listings.
Is there any OCR software that can deal with this sort
of thing? This is
potentially made more complicated by the fact that there are horizontal
lines across most of the pages, which in many cases intersect with
the text
(which also isn't the clearest text I've ever
seen).
The aforementioned source code listings were in better shape than this
Tek listing, and didn't have horizontal lines, but I couldn't find any
OCR package that could handle them. It would have taken far longer to
clean up the OCR output than it took me to type it in myself. OCR is
designed for business letters, not code listings.
What I did in each case was:
1) typed in the listing *including* the line numbers, address, and
object code
2) used an awk script to process that into a source file
3) wrote an assembler compatible with the original HP assembler
4) assembled the source code, producing a listing file
5) compared the listing file output by the assembler to the listing I
typed in
6) correct errors
7) lather, rinse, repeat