On Dec 21,
2009, at 9:38 PM, Al Kossow wrote:
On
12/21/09 5:35 PM, Jerome H. Fine wrote:
I have about 100,000 lines of code in over 3
dozen PDF files that were
scanned from the hard copy listings. Unfortunately, the original
text source
files were lost, so the PDF files are a last resort. Other than
typing in the
code by hand from the PDF file, are there any good freeware programs
to convert a PDF back to a text file?
sounds like the TSX-Plus listings I scanned for Lyle.
I spent a little time playing with ocropus and then teseract, trying
to scan
pdp-11 diags back to text. I didn't have good luck. I'd be
interested if others
have a working formula.
I did have a little fun "training" tereract on the line printer
font. I think that
technique holds promise but it needed more data to do a good job (my
initial sample
was too small, but did improve things a lot).
just curious if anyone else has tried training one of the ocr
programs to read
line printer fonts.
Al Kossow is CORRECT!!!!!!!!!! Look for
/pdf/dec/pdp11/tsxPlus/listings/6.40/
at bitsavers. That was a GREAT job Al. THANK YOU!
The original text files were lost. ALL of the PDF files are text!