Scanning docs for bitsavers

Eric Smith spacewar at gmail.com
Tue Dec 3 11:27:52 CST 2019


On Mon, Dec 2, 2019 at 5:34 PM Guy Dunphy via cctalk <cctalk at classiccmp.org>
wrote:

> Mentioning JBIG2 (or any of its predecessors) without noting that it is
> completely unacceptable as a scanned document compression scheme,
> demonstrates
> a lack of awareness of the defects it introduces in encoded documents.
>

Perhaps you are not aware that the JBIG2 standard has a lossless mode.
Certainly JBIG2 lossy mode is _extremely_ lossy, but lossless mode doesn't
have those problems.

It's entirely possible that the common JBIG2 encoders either don't offer
lossless mode, or don't make it easy to configure.

G4 compression was invented for fax machines. No one cared much about visual
> quality of faxes, they just had to be readable. Also the technology of fax
> machines was only capable of two-tone B&W reproduction, so that's what G4
> encoding provided.
>
Thinking these kinds of visual degradation of quality are acceptable when
> scanning documents for long term preservation, is both short sighted and
> ignorant of what can already be achieved with better technique.
>

When used at an appropriate resolution (e.g., not 100 DPI), G4 encoding is
perfectly fine for bilevel documents (text and line art) that are in good
condition. If the documents were originally bilevel but have suffered from
significant degradation in reproduction, then they are effectively no
longer bilevel, and G4 (at any resolution) is inappropriate.

And therefore why PDF isn't acceptable as a
> container for long term archiving of _scanned_ documents for historical
> purposes.
>

You state that as if it was a fact universally agreed upon, which it
clearly is not.

If you despise PDF as an archival format, by all means please feel free to
NOT avail yourself of the hundreds of thousands of pages of archives in PDF
format e.g. on Bitsavers.

I'm of course not claiming that PDF is perfect, nor is G4 encoding.


More information about the cctech mailing list