Manual scanning: TIFF-to-PDF software with greyscale support?

14 Dec 2009

Philip Pemberton wrote:
...
  Hi guys,
   I'm after a program that can convert TIFF files into PDFs. I've seen 
 Eric Smith's "Tumble" app, which works great... but only for B&W TIFFs.

 While I can use Imagemagick to convert the images to B&W, that defeats 
 the point: there are photos on the scanned pages, and I'd rather like to 
 keep them as photos, not black splodges. 
Hmm, what are you using for the Imagemagick command line? I've thrown TIFFs at 
it before (at varying levels of bpp) and got sensible PDFs out the back end. 
It's possible that it's a versioning/library issue, though - doesn't IM make a

call into someone elses' PDF library to do the actual assembly? Maybe it's 
this part that's "broken" on your particular setup.

...
    Also, has anyone come up with a "best practice
guide" for manual 
 scanning? At the moment I'm scanning like this:

   B&W text only: 600dpi, black and white, threshold=50%. 
I'd still do those at a few bpp unless the pages were totally free on 
contaminants, creases etc. - I've seen cases where 1bpp introduces noise into 
the image which can screw up a later OCR attempt. I'd question the threshold 
too, unless you've got time to proof-read everything (things like scans from 
dot-matrix printouts can vary quite a lot in tone I've found, so I think it's 
better to keep things "as-is" and consider things like threshold tweaks as 
part of a subsequent "post-process" or OCR phase)

I've usually done things at 300 or 400dpi (depending on the content) just to 
keep the sizes down a bit - but with storage getting ever-cheaper there's 
perhaps not the incentive to do that now and 600dpi is fine (more is probably 
overkill unless trying to do things like fiche)

...
  Obviously if there are better ways (in terms of
quality and/or speed) 
 I'd like to know before I scan a ton of testgear manuals... 
I didn't have a sheet-feeder, but a lot of my stuff was comb-bound (and/or I 
had lots of data spread across manuals with low page counts). I did thousands 
of pages by hand, and it was somewhat soul-destroying. :/

...
  Also, does anyone know of an app that can take the PDF
file, OCR it and 
 then insert the text as a background layer while leaving the image 
 alone?  
Not me. I chose to delegate the OCR step to future generations (by which time 
OCR will hopefully be a little better anyway) :-)

I couldn't handle scanning all the above content *and* proof-reading the 
subsequent OCR (and personally I like physical printouts, so whether it's OCR 
or images makes no difference to me)

cheers

Jules

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Manual scanning: TIFF-to-PDF software with greyscale support?