Scanning old manuals

9 Mar 1999

On Tue, 9 Mar 1999, Pete Turnbull wrote:

...
  That's not what I'd call "high". 
That means that on average, you have to
 correct or interpret every tenth character.  I'd call less than 99% "low",
 not high.  Our Department looked at this a few years ago, and rejected
 anything less than 95%, I think.  Even that means correcting (or as one
 person put it, "clicking on") one character in every twenty.

That's not what I meant. I did not study the results closely and so I 
wrote "high 90%" as a disclaimer to mean something like 98, 98.5, 99, 
99.5, or 99.9. Perhaps I should have used the word "range". It seemed to 
me that I was getting less than 1 to no more than two words per hundred 
that needed correcting and I don't remember any punctuation or numerical 
errors.

William Donzelli wrote:

...
  The best solution for this is to keep the scans AND
the OCR'd text. That
 way, with a simple database, one could do searches on the text, and get
 most of the hits, yet actually read the images. 
A good observation, which brings up the question whether anyone has 
database templates and what database are they using. How does one deal 
with separate text like sidebars and captions? Should you save an image 
of the page and individual images in the database along with text? This 
rules out mych legacy db software. Perhaps keep individual files and 
database the directory?

Anyone using document management software? There seems to be 3 or 4 low 
priced ones for windows, a couple for the Mac, maybe something for 
another platform (anything for Linux?) and everything else is 
stratospherically priced. 

Chuck McManis wrote:

...
  300 DPI B&W is good for most printed manuals
_without_ graphics because it
 is a 1:1 ratio with what most printers can print. 200 DPI gives you a 2:3
 ratio of real pixels to printer pixels and I've seen that introduce banding
 on the printed output. 
300 dpi is ideal for print. Is that the best ultimate goal - scan at 
some multiple of 300 (over or under) in order to optimize for eventual 
printout?

                                                  -- Stephen Dauphin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Scanning old manuals