PDF to Text Conversion (Was: Manual scanning: TIFF-to-PDF software with greyscale support?)

22 Dec 2009

On Dec 21, 2009, at 9:38 PM, Al Kossow wrote:
...
  On 12/21/09 5:35 PM, Jerome H. Fine wrote:
  I have about 100,000 lines of code in over 3
dozen PDF files that
 were
 scanned from the hard copy listings. Unfortunately, the original text
 source
 files were lost, so the PDF files are a last resort. Other than
 typing
 in the
 code by hand from the PDF file, are there any good freeware programs
 to convert a PDF back to a text file?

 sounds like the TSX-Plus listings I scanned for Lyle. 
I spent a little time playing with ocropus and then teseract, trying
to scan
pdp-11 diags back to text.  I didn't have good luck.  I'd be
interested if others
have a working formula.
I did have a little fun "training" tereract on the line printer font.
I think that
technique holds promise but it needed more data to do a  good job (my
initial sample
was too small, but did improve things a lot).
just curious if anyone else has tried training one of the ocr programs
to read
line printer fonts.
-brad

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

PDF to Text Conversion (Was: Manual scanning: TIFF-to-PDF software with greyscale support?)