Manual scanning: TIFF-to-PDF software with greyscale support?

14 Dec 2009

Jules Richardson wrote:
...
  Hmm, what are you using for the Imagemagick command
line? 
The problem isn't Imagemagick, it's that when you scan a greyscale image 
and then downconvert it to 1bpp, you get either black or white. The 
greyscales disappear and any photos basically get turned to crap.

I've been using Xsane to scan --
   For greyscale:
     Res: 600DPI
     BPP: 8
   For B&W:
     Res: 600DPI
     BPP: 1
     Halftoning: None
     Brightness: 0
     Sharpness:  0
     Gamma correction: default
     Threshold: 128

   All saved to TIFF format -- greyscale images get saved as 8bit 
Deflated TIFF data, B&W as CCITT Group 4 FAX.

Then to deal with the moire and screening patterns and convert the 
greyscale images to PNGs, I feed them through ImageMagick:

   convert FrontPage-0001.tiff -adaptive-blur 4x4 \
     -units PixelsPerInch -density 600 -resample 300 \
     -format png FrontPage-0001.png

Then finally, convert to PDF with Tumble:

   tumble -b %F FrontPage-0001.png FrontPage-0002.tiff 0*.tiff \
     -o tumbled_doc.pdf

And view the result:

   evince tumbled_doc.pdf

...
  - doesn't IM make a call into someone elses'
PDF library to do the 
 actual assembly? Maybe it's this part that's "broken" on your
particular 
 setup. 
I'm not using Imagemagick for the PDF conversion, I'm using Tumble.

Also, FYI -- if anyone wants to play with my branch of Tumble (with 
PNG+JP2 support and the TIFF Photometry Tag bugfix), it's online here:
   http://hg.philpem.me.uk/tumble/

This syncs against my private Hg repository roughly every five minutes. 
Either clone the repository (if you use Mercurial you already know how 
to do this) or go to that URL and click the "bz2" link.

@Eric: if you want this fork of Tumble to disappear, let me know 
off-list. I only forked it because it appeared to be completely 
unmaintained.

...
  I'd still do those at a few bpp unless the pages
were totally free on 
 contaminants, creases etc. 
Which these are. They're fresh out of the HP shrink-wrap.

...
  I'd question the
 threshold too, unless you've got time to proof-read everything (things 
 like scans from dot-matrix printouts can vary quite a lot in tone I've 
 found, so I think it's better to keep things "as-is" and consider things 
 like threshold tweaks as part of a subsequent "post-process" or OCR phase)

These seem to be laser-printed. Or if not, they were printed with 
something very similar to a laser printer -- the pages have that 
distinctive toner-on-paper feel.

...
  I didn't have a sheet-feeder, but a lot of my
stuff was comb-bound 
 (and/or I had lots of data spread across manuals with low page counts). 
 I did thousands of pages by hand, and it was somewhat soul-destroying. :/ 
Yeah, this is pretty boring. I'm basically scanning a few pages, reading 
emails, scanning some more pages, doing some coding, and carrying on 
like that. At least it delays the effects of boredom a little.

Cheers,
-- 
Phil.
classiccmp at philpem.me.uk
http://www.philpem.me.uk/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Manual scanning: TIFF-to-PDF software with greyscale support?