If you OCR, always archive the bitmaps too - Re: Regarding Manuals

26 Sep 2015

On 2015-09-26 4:28 PM, Johnny Billquist wrote:
...
  On 2015-09-26 12:16, Johnny Billquist wrote:
  On 2015-09-25 22:35, Al Kossow wrote:
  I have been going back and applying OCR to the
ones on bitsavers.
 Are there some in particular that you have a problem with? 
 Aha. I wasn't aware of that. I've downloaded copies many years ago that
 I've been keeping locally. I'll check out the current versions on
 bitsavers then. 
 Al, exactly how have they been OCRed? Looking at them, it would appear
 that what you see is still the bitmaps of all the pages, but then you
 have the basic text also available for selection/searching.
 My issue with that is that the documents are huge, and the experience
 just scrolling through them is pretty bad. 
Imho, though I am sure I am not alone:
Software which "recreates" the typography of a document from OCR does
not produce an acceptable substitute, I've yet to see a book that wasn't
ruined by it.
Just worth mentioning for anyone who might be tempted - For this reason
and others, the bitmaps must NEVER be discarded (Although of course
bitmaps can be archived in a different file if people want to supply OCR
as well.)
--Toby
...

 Sadly I don't even remember what software I used for OCR about 10 years
 ago, but I had something for Windows back then, which actually figured
 out fonts and all, and created a plain Word document from the OCR
 process. That was a really nice piece of software, which preserved
 formatting, fonts and all. I have a short example of the results at
 http://www.update.uu.se/~bqt/Clarkson.pdf, which was just a scan of two
 pages from a book. I created the pdf from Word.
 A process like that is what I'd like, except for figures, which needs to
 be kept as bitmaps, I suspect.
      Johnny

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

If you OCR, always archive the bitmaps too - Re: Regarding Manuals