On 12/9/2010 9:12 PM, Eric Smith wrote:
Josh Dersch wrote:
I was considering it, but it's about 950
pages which makes it a
rather daunting task.
I typed in around that many pages of listings on two different
occasions, once for the HP-41C mainframe ROM source code, and once for
HP 2000C Time-Shared BASIC. Each project took a few weeks of spare
time. However, I don't have spare time any more, so I'm not likely to
do this for the Tek listings.
That's dedication :).
Is there any OCR software that can deal with this
sort of thing?
This is
potentially made more complicated by the fact
that there are horizontal
lines across most of the pages, which in many cases intersect with
the text
(which also isn't the clearest text I've
ever seen).
The aforementioned source code listings were in better shape than this
Tek listing, and didn't have horizontal lines, but I couldn't find any
OCR package that could handle them. It would have taken far longer to
clean up the OCR output than it took me to type it in myself. OCR is
designed for business letters, not code listings.
Yeah, that's pretty much what I expected. I guess even if the OCR
software was robust enough to deal with the lines in the images, the
occasional bits of noise in the resulting text would be very annoying to
track down and correct.
What I did in each case was:
1) typed in the listing *including* the line numbers, address, and
object code
2) used an awk script to process that into a source file
3) wrote an assembler compatible with the original HP assembler
4) assembled the source code, producing a listing file
5) compared the listing file output by the assembler to the listing I
typed in
6) correct errors
7) lather, rinse, repeat
I'd love to do something like that with these Tek sources, and then tie
it in with the emulator I've been toying with, but I just don't think I
have the time. It'd be really nice to have the emulator's debugger be
able to match up the current instruction with line(s) from the source
files, etc...
Josh