Don wrote:
Hi,
I usually don't bother compressing TIF's of scanned
images -- since I'm not too concerned with saving
space for small documents.
But, recently, I started scanning B-size drawings
...
Suggestions? I had thought of FAX encoding (naive
but it should work well on line drawings/schematics)...
There is a classiccmp knowledge base article on this:
http://www.classiccmp.org/kb/submit_comment.php?FileID=2
However, I have some criticisms of it.
First, it seems to address a few of the "hot buttons" of the author,
Eric Smith, but really isn't very comprehensive guidance for somebody
who would like to start scanning in earnest. What DPI to use? What
pre/post processing is profitable? Trade-offs of the formats?
Software? Scanners? OCR?
Second, I think it is pretty one sided; real life is more shaded.
(*) "JPEG is blurry". This is not necessarily true. JPEG has a
lossless mode, but it is not generally used and at best gives only 2:1
compression, typically, so that is just a nit. Even in lossy mode, JPEG
can asymptotically yield lossless results, but at the cost of large files.
(*) "G3/G4/TIFF is lossless". Sure, if you are starting from 1 bpp
images, but when scanning, you aren't. Text and line art are continuous
images, but they do have sharp edges (but not staircases!). The scanner
takes an area sample of each "point" on the page, and due to the high
number of edge pixels, many of them fall between 0% and 100% intensity.
Converting to 1 bpp loses oodles of information; this loss isn't
apparent if scanning at high enough resolution or if not zoomed in.
Zoom in far enough, no matter what resolution, and the image will be
clearly quantized. For most practical applications this is not a concern.
Another way to view the trade off is that while jpeg tends to throw away
high frequencies, capturing at 1 bpp adds artificial high frequencies.
Decreasing compression for jpg or scanning at higher resolution for 1
bpp capture will reduce their respectively problems, but at the cost of
larger file size for both.
There are situations where capturing a scan as bitonal is a bad idea,
and luckily I haven't had too many cases where using a low loss jpeg
format was necessary. For example, I had some blueline reproductions
that were reductions of an original, had uneven, blotchy tone, with lots
of smudges. I'm sure with enough time and effort it is theoretically
possible to have a filter that adaptively thresholds things *just
right*, but I don't have it. It was easiest just to live with 2 MB jpegs.
Another situation where jpeg is better than 1 bpp, no matter what file
format -- if you just don't care about file size and want the ultimate
in fidelity (in that case, you can argue that you should be using bmp
files).
Since this subject comes up pretty often, the article could use a
rewrite, as it is lacking a lot of information, if if you disagree with
my criticisms.
So as not to sound too pedantic about all of this, file size is still an
issue and most of the documents that I scan are relatively clean; I find
that 1 bpp images captured at 300 - 400 dpi (more if the text or images
are particularly fine) are the best choice, along with the occasional
jpeg insert.
As for Barry's rule of thumb for jpg image size, there is a wide range
of tolerance that people are willing to live with, and I don't doubt his
10KB produces fine results. I don't understand the comment that B&W
images should be 1/3 the size of color images; a 300 dpi color jpg image
stores intensity at 300 dpi and two channels of color differences at 150
dpi (that is, only 1/3 of a color jpg file contains color information).
Stripping out color should save only 1/3 of the file size, not 2/3.