PDF to Text Conversion (Was: Manual scanning: TIFF-to-PDF software with greyscale support?)

22 Dec 2009

...
 Brad Parker wrote: 
...
  On Dec 21,
2009, at 9:38 PM, Al Kossow wrote: 
  On
12/21/09 5:35 PM, Jerome H. Fine wrote: 
  I have about 100,000 lines of code in over 3
dozen PDF files that  were
 scanned from the hard copy listings. Unfortunately, the original
 text  source
 files were lost, so the PDF files are a last resort. Other than
 typing in the
 code by hand from the PDF file, are there any good freeware programs
 to convert a PDF back to a text file? 
 sounds like the TSX-Plus listings I scanned for Lyle. 
 I spent a little time playing with ocropus and then teseract, trying
 to scan
 pdp-11 diags back to text.  I didn't have good luck.  I'd be
 interested if others
 have a working formula.
 I did have a little fun "training" tereract on the line printer
 font.   I think that
 technique holds promise but it needed more data to do a  good job (my
 initial sample
 was too small, but did improve things a lot).
 just curious if anyone else has tried training one of the ocr
 programs  to read
 line printer fonts. 
Al Kossow is CORRECT!!!!!!!!!!  Look for
/pdf/dec/pdp11/tsxPlus/listings/6.40/
at bitsavers.  That was a GREAT job Al.  THANK  YOU!
The original text files were lost.  ALL of the PDF files are text!

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

PDF to Text Conversion (Was: Manual scanning: TIFF-to-PDF software with greyscale support?)