Scanning Suggestions (Bookmarks & Colour)

Paul Koning paulkoning at comcast.net
Fri Aug 27 19:32:25 CDT 2021



> On Aug 27, 2021, at 5:36 PM, Antonio Carlini <a.carlini at ntlworld.com> wrote:
> 
> On 27/08/2021 22:05, Paul Koning wrote:
>> JPG is the wrong tool for pages with color text or color line art. As I've mentioned before, JPG is fit ONLY for photos, not for any image with hard edges. Text compressed with JPG will suffer badly. 
> 
> 
> Yes, true. I thought that for colour, all I could get was JPEG. It certainly seems to be the case that the HP PhotoSmart I have scans everything as JPEG 300 dpi when you use the front panel to scan to a memory stick. Post processing wouldn't make that any better, which is why I thought I was stuck with JPEG.

Wow, that's crazy.  Perhaps they thought the product was only going to be used by consumers who have no clue.

> It turns out though that if you drive it with a computer then you also get the choice of TIFF or PNG as additional choices. TIFF is likely to be quite a bit too big. I'll try PNG and see how big the files it generates are. I've no idea what the default compression is straight out of the software but as long as it's lossless I can hopefully post-process to squeeze things down if possible.

TIFF is (normally) lossless.  I think PNG also, or at least can be, but I don't understand it as well.

TIFF is actually a container and inside it can be any number of encodings.  Compression schemes can be simple ones like run length coding, or more complex ones like LZ.  Either way, if there are patterns, especially significant areas of the same color, the compression works very well indeeed.

A raw scan probably won't compress well.  But something as simple as a white point adjustment to make the bulk of the background be full white will make the file very much smaller.  If you tweak the black point some as well, so areas meant to be black are in fact full black rather than slightly-varying grays, you will gain still more.  As a bonus, the resulting image will also be much crisper and easier to read.

The other day there was a mention of open souce tools at leptonica.org: from the examples given in the intro, for example here: http://www.leptonica.org/binarization.html it looks like a very nice tool kit to clean up images very well and easily.   While I don't see it mentioned, the cleaned up images will certainly compress very effectively in TIFF.

	paul



More information about the cctalk mailing list