manx is dead?

4 Oct 2009

On Sun, 2009-10-04 21:20:46 +0200, Jan-Benedict Glaw <jbglaw at lug-owl.de> wrote:
...
  On Wed, 2009-09-30 13:12:25 -0700, Al Kossow <aek
at bitsavers.org> wrote:
  Jan-Benedict Glaw wrote:
  They surely could OCR scanned PDFs, but I'm
not sure if they
 will do that. 
 I can assure you that they HAVE been doing that on bitsavers
 content for over a year. That was one of the reasons I decided
 to start converting the content. People were contacting me wondering
 why the OCRed version wasn't on line. 
 Maybe the current workflow for creating the PDFs on bitsavers.org
 could be a bit better documented? The docs over there only mention the
 simple conversion via thumble. It's latest NEWS update is dated
 20031209, so quite aged, too. I guess that parts of the workflow are
 different these days? 
To place it into public discussion, my li'l script (used to cut
2side scans from multi-page TIFFs and make a nice PDF book of that)
basically uses `tiffsplit' to create one-page TIFFs, `convert' to
conver to .pbm format, `unpaper' to cut (and straighten) out the two
book pages per TIFF/PBM page, then use `ocroscript rec-tess
--tesslanguage=en ...' to OCR each simgle page and finally use
`HocrConverter.py' [1] to assemble single-page straightened TIFF pages
and the HOCR scan results to a PDF.
All parts of that software stack is in Debian, except the Hocr stuff
(for scanning the pages and generate a file that also contains the
position of scanned text on the page).
MfG, JBG
[1] http://xplus3.net/downloads/HocrConverter.gz, linked from
    http://xplus3.net/2009/04/02/convert-hocr-to-pdf/
--
      Jan-Benedict Glaw      jbglaw at lug-owl.de              +49-172-7608481
Signature of: 23:53 <@jbglaw> So, ich kletter' jetzt mal ins Bett.
the second  : 23:57 <@jever2> .oO( kletter ..., hat er noch Gitter vorm Bett, wie
fr?her meine Kinder?)
              00:00 <@jbglaw> jever2: *patsch*
              00:01 <@jever2> *aua*, wof?r, Gedanken sind frei!
              00:02 <@jbglaw> Nee, freie Gedanken, die sind seit 1984 doch aus!
              00:03 <@jever2> 1984? ich bin erst seit 1985 verheiratet!

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

manx is dead?