On Aug 27, 2021, at 4:50 PM, Antonio Carlini via
cctalk <cctalk at classiccmp.org> wrote:
I have a few manuals to scan and I'm looking for suggestions, about how to add
bookmarks and how to handle colour.
...
For photographs or shaded areas that don't necessarily come out well under those
settings, I plan to use 8-bit greyscale. I'd prefer to use 600dpi but I may have to
fall back to 300dpi if the per-page fiile size shoots up too much.
Depending on the resolution used, given that the photos are printed as halftone (black
dots of various sizes), you may get weird scan artifacts. Some scan programs may have
tools to convert a halftone image to the equivalent grayscale image, such a thing is
likely to be helpful.
The real issue is colour. I know that various people
have looked at the issue of how to efficiently scan pages that are mostly black and white
but have some coloured text (RSX-11 manuals and early VMS manuals did this to highlight
terminal input, for example). I don't think this is a solved problem and I'm not
expecting a solution, what I'm really looking for is to check that what I'm about
to produce will have all the information that a future efficient algorithm is likely to
need.
I'm going to start by scanning the whole manual as though it had no colour (so 600
dpi bilevel G4 encoded, except for pages with photos and shading and so on). Then I'm
going to go back and rescan the pages that have colour and scan those at 600 dpi and save
as a JPG.
JPG is the wrong tool for pages with color text or color line art. As I've mentioned
before, JPG is fit ONLY for photos, not for any image with hard edges. Text compressed
with JPG will suffer badly.
For material such as the RSX manuals you mentioned, the tool needed is a compression
algorithm that handles color with hard edges faithfully. Basically that means a lossless
compression scheme. That should be fine, since pages like that should compress very well,
at least if the scan has been touched up just a bit to make the page background reasonably
pure white. With more effort it would be possible to reconstruct the original three-color
material (white, black, red or whatever), but that's a fair amound harder and probably
not necessary for adequate compression. But please, make it a practice to avoid JPG
except in those cases (rare or non-existent in document scanning work) where you're
actually dealing with a continuous tone photograph).
paul