On Dec 21, 2009, at 9:38 PM, Al Kossow wrote:
On 12/21/09 5:35 PM, Jerome H. Fine wrote:
I have about 100,000 lines of code in over 3
dozen PDF files that
were
scanned from the hard copy listings. Unfortunately, the original text
source
files were lost, so the PDF files are a last resort. Other than
typing
in the
code by hand from the PDF file, are there any good freeware programs
to convert a PDF back to a text file?
sounds like the TSX-Plus listings I scanned for Lyle.
I spent a little time playing with ocropus and then teseract, trying
to scan
pdp-11 diags back to text. I didn't have good luck. I'd be
interested if others
have a working formula.
I did have a little fun "training" tereract on the line printer font.
I think that
technique holds promise but it needed more data to do a good job (my
initial sample
was too small, but did improve things a lot).
just curious if anyone else has tried training one of the ocr programs
to read
line printer fonts.
-brad