Jules Richardson wrote:
Hmm, what are you using for the Imagemagick command
line?
The problem isn't Imagemagick, it's that when you scan a greyscale image
and then downconvert it to 1bpp, you get either black or white. The
greyscales disappear and any photos basically get turned to crap.
I've been using Xsane to scan --
For greyscale:
Res: 600DPI
BPP: 8
For B&W:
Res: 600DPI
BPP: 1
Halftoning: None
Brightness: 0
Sharpness: 0
Gamma correction: default
Threshold: 128
All saved to TIFF format -- greyscale images get saved as 8bit
Deflated TIFF data, B&W as CCITT Group 4 FAX.
Then to deal with the moire and screening patterns and convert the
greyscale images to PNGs, I feed them through ImageMagick:
convert FrontPage-0001.tiff -adaptive-blur 4x4 \
-units PixelsPerInch -density 600 -resample 300 \
-format png FrontPage-0001.png
Then finally, convert to PDF with Tumble:
tumble -b %F FrontPage-0001.png FrontPage-0002.tiff 0*.tiff \
-o tumbled_doc.pdf
And view the result:
evince tumbled_doc.pdf
- doesn't IM make a call into someone elses'
PDF library to do the
actual assembly? Maybe it's this part that's "broken" on your
particular
setup.
I'm not using Imagemagick for the PDF conversion, I'm using Tumble.
Also, FYI -- if anyone wants to play with my branch of Tumble (with
PNG+JP2 support and the TIFF Photometry Tag bugfix), it's online here:
http://hg.philpem.me.uk/tumble/
This syncs against my private Hg repository roughly every five minutes.
Either clone the repository (if you use Mercurial you already know how
to do this) or go to that URL and click the "bz2" link.
@Eric: if you want this fork of Tumble to disappear, let me know
off-list. I only forked it because it appeared to be completely
unmaintained.
I'd still do those at a few bpp unless the pages
were totally free on
contaminants, creases etc.
Which these are. They're fresh out of the HP shrink-wrap.
I'd question the
threshold too, unless you've got time to proof-read everything (things
like scans from dot-matrix printouts can vary quite a lot in tone I've
found, so I think it's better to keep things "as-is" and consider things
like threshold tweaks as part of a subsequent "post-process" or OCR phase)
These seem to be laser-printed. Or if not, they were printed with
something very similar to a laser printer -- the pages have that
distinctive toner-on-paper feel.
I didn't have a sheet-feeder, but a lot of my
stuff was comb-bound
(and/or I had lots of data spread across manuals with low page counts).
I did thousands of pages by hand, and it was somewhat soul-destroying. :/
Yeah, this is pretty boring. I'm basically scanning a few pages, reading
emails, scanning some more pages, doing some coding, and carrying on
like that. At least it delays the effects of boredom a little.
Cheers,
--
Phil.
classiccmp at philpem.me.uk
http://www.philpem.me.uk/