On Apr 24, 2015, at 9:48 AM, Noel Chiappa <jnc at
mercury.lcs.mit.edu> wrote:
From: shadoooo
I'm scanning at 600dpi grayscale, lossless
compression.
I've been scanning a few things too, and I found that 600dpi grayscale
produced absolutely enormous files (many, many MB's per page, for prints), no
matter what I tried to do, compression-wise.
600dpi black and white, followed by saving as TIFF's with CCITT Group 4
compression, produced immensely smaller files (small 100's of KB's for the
same pages), and they are quite readable (even the fine letter seems to be
readable - b/6 is quite distinguishable, etc).
If you?re looking to scan for human consumption, bitmap works ok. But I?ve found that OCR
programs seem to want grayscale. Why that is, I don?t know; they do seem to convert it
to bitmap at some point. Possibly the threshold logic is more complex.
That brings up thresholds. When scanning, or converting to, bitmap, you have to set the
gray threshold that is the cutoff between white and black. The default would typically be
128 (50%). Depending on the scanner and the condition of the originals, that threshold
may be fine, or it may be far off the optimal. A good approach is to scan a number of
representative pages in grayscale, and experiment with different threshold settings to see
which one is the best. Basically, you?re looking for the compromise between filled in
loops, and broken thin lines. For printed originals, this is probably not all that
critical; for typewritten material, it is far more so.
paul