Scanning Suggestions (Bookmarks & Colour)

Paul Koning paulkoning at
Fri Aug 27 16:05:47 CDT 2021

> On Aug 27, 2021, at 4:50 PM, Antonio Carlini via cctalk <cctalk at> wrote:
> I have a few manuals to scan and I'm looking for suggestions, about how to add bookmarks and how to handle colour.
> ...
> For photographs or shaded areas that don't necessarily come out well under those settings, I plan to use 8-bit greyscale. I'd prefer to use 600dpi but I may have to fall back to 300dpi if the per-page fiile size shoots up too much.

Depending on the resolution used, given that the photos are printed as halftone (black dots of various sizes), you may get weird scan artifacts.  Some scan programs may have tools to convert a halftone image to the equivalent grayscale image, such a thing is likely to be helpful.

> The real issue is colour. I know that various people have looked at the issue of how to efficiently scan pages that are mostly black and white but have some coloured text (RSX-11 manuals and early VMS manuals did this to highlight terminal input, for example). I don't think this is a solved problem and I'm not expecting a solution, what I'm really looking for is to check that what I'm about to produce will have all the information that a future efficient algorithm is likely to need.
> I'm going to start by scanning the whole manual as though it had no colour (so 600 dpi bilevel G4 encoded, except for pages with photos and shading and so on). Then I'm going to go back and rescan the pages that have colour and scan those at 600 dpi and save as a JPG.

JPG is the wrong tool for pages with color text or color line art.  As I've mentioned before, JPG is fit ONLY for photos, not for any image with hard edges.  Text compressed with JPG will suffer badly.

For material such as the RSX manuals you mentioned, the tool needed is a compression algorithm that handles color with hard edges faithfully.  Basically that means a lossless compression scheme.  That should be fine, since pages like that should compress very well, at least if the scan has been touched up just a bit to make the page background reasonably pure white.  With more effort it would be possible to reconstruct the original three-color material (white, black, red or whatever), but that's a fair amound harder and probably not necessary for adequate compression.  But please, make it a practice to avoid JPG except in those cases (rare or non-existent in document scanning work) where you're actually dealing with a continuous tone photograph).


More information about the cctalk mailing list