Scanning Suggestions (Bookmarks & Colour)
Paul Flo Williams
paul at frixxon.co.uk
Thu Sep 2 14:51:20 CDT 2021
With apologies for breaking the threading, as I've just rejoined and I'm
responding to something I've just spotted in the archive ...
Regarding colour separations for scanned documents, GraphicsMagick is
quite capable of producing the required individual colour layers. If
you identify the colours you wish to pull out, you can use the "-fuzz"
and "-opaque" operators to change any given colour range (fuzz uses
Euclidean distance in RGB space) into another one (the current "-fill"
I haven't finished writing this up, but my workflow tends to be to
produce a Group4 TIFF from the colour scan by simple thresholding (or
first dropping the other colours to white, if they are quite dark), and
then produce all the other separations by dropping black out,
converting your spot colour to black and then thresholding. This way
you get two or more images:
1) PNG(s) containing pixels that are all either white or your spot
2) a G4 TIFF for the black and white layer.
The PNG must be saved as a two-colour paletted image so that they can
be used as masks in the final PDF. I always apply the black and white
(text) layer on top of every page, so that the fuzzing of the colour
layers doesn't reduce the clarity of the text.
This might sound awkward, but I've found that one fuzz value tends to
work for all the pages when extracting a given colour, so you can
process all pages in a loop. I use the Perl module PDF::Builder to put
my scans together, but I think tumble is capable of overlays too.
PNGs are compressed with deflate. If the spot colours you are
processing apply to text in the document, my first thought was that I
could save a bunch of Group4 TIFFs, one for each colour, and mask those
into the PDF, because Group4 compression is impressive for text. It took
some frustrating experiments before realising the Group4 compression
isn't defined for two colour images in general; it is specifically for
images that are black and white, and PDF won't let you circumvent that!
I've just scanned another document with some blue diagrams and table
backgrounds, if you'd like to see an example:
I might reprocess this later, but for now, I didn't even bother
separating out pages that contain blue from ones that don't; every page
has a blue layer, even if it's blank. If you're wide awake, you may
spot that the blue layer on page 41 doesn't extend to the bottom of the
table. This isn't a processing flaw; the document is actually printed
More information about the cctalk