On Aug 27, 2021, at 5:36 PM, Antonio Carlini
<a.carlini at ntlworld.com> wrote:
On 27/08/2021 22:05, Paul Koning wrote:
JPG is the wrong tool for pages with color text
or color line art. As I've mentioned before, JPG is fit ONLY for photos, not for any
image with hard edges. Text compressed with JPG will suffer badly.
Yes, true. I thought that for colour, all I could get was JPEG. It certainly seems to be
the case that the HP PhotoSmart I have scans everything as JPEG 300 dpi when you use the
front panel to scan to a memory stick. Post processing wouldn't make that any better,
which is why I thought I was stuck with JPEG.
Wow, that's crazy. Perhaps they thought the product was only going to be used by
consumers who have no clue.
It turns out though that if you drive it with a
computer then you also get the choice of TIFF or PNG as additional choices. TIFF is likely
to be quite a bit too big. I'll try PNG and see how big the files it generates are.
I've no idea what the default compression is straight out of the software but as long
as it's lossless I can hopefully post-process to squeeze things down if possible.
TIFF is (normally) lossless. I think PNG also, or at least can be, but I don't
understand it as well.
TIFF is actually a container and inside it can be any number of encodings. Compression
schemes can be simple ones like run length coding, or more complex ones like LZ. Either
way, if there are patterns, especially significant areas of the same color, the
compression works very well indeeed.
A raw scan probably won't compress well. But something as simple as a white point
adjustment to make the bulk of the background be full white will make the file very much
smaller. If you tweak the black point some as well, so areas meant to be black are in
fact full black rather than slightly-varying grays, you will gain still more. As a bonus,
the resulting image will also be much crisper and easier to read.
The other day there was a mention of open souce tools at leptonica.org
: from the examples
given in the intro, for example here: http://www.leptonica.org/binarization.html
like a very nice tool kit to clean up images very well and easily. While I don't see
it mentioned, the cleaned up images will certainly compress very effectively in TIFF.