Scanning Suggestions (Bookmarks & Colour)
Antonio Carlini
a.carlini at ntlworld.com
Fri Aug 27 15:50:14 CDT 2021
I have a few manuals to scan and I'm looking for suggestions, about how
to add bookmarks and how to handle colour.
Bookmarks should be easier, so lets start with that. I want to add
bookmarks (or whatever they are called) so that it is easy to navigate
to page "2-48" or "C-17" in a document. Many of the PDFs on bitsavers
have that and I've found it very useful so I'd like to do that for my
future scans. I've tried with pdftk (the Java port as the original is no
longer available on my distro) but that failed. So I tried GhostScript
and that also failed, while also rewriting the PDF to be considerably
larger. Is there simple way to achieve this (ideally from the CLI)?
Now for the scanning itself.
For manuals that are simple monochrome, I plan to scan at 600dpi bilevel
G4 encoded, wrapped in PDF.
For photographs or shaded areas that don't necessarily come out well
under those settings, I plan to use 8-bit greyscale. I'd prefer to use
600dpi but I may have to fall back to 300dpi if the per-page fiile size
shoots up too much.
The real issue is colour. I know that various people have looked at the
issue of how to efficiently scan pages that are mostly black and white
but have some coloured text (RSX-11 manuals and early VMS manuals did
this to highlight terminal input, for example). I don't think this is a
solved problem and I'm not expecting a solution, what I'm really looking
for is to check that what I'm about to produce will have all the
information that a future efficient algorithm is likely to need.
I'm going to start by scanning the whole manual as though it had no
colour (so 600 dpi bilevel G4 encoded, except for pages with photos and
shading and so on). Then I'm going to go back and rescan the pages that
have colour and scan those at 600 dpi and save as a JPG. Then I'll
produce a final PDF with the colour pages inserted. I'll also produce a
PDF with the B&W pages that were replaced by colour pages (I assume OCR
will be better served by non-jaggy scans).
So the final outputs will be:
manual.pdf - the whole manual, including whole pages scanned as colour
if any colour is present on them
manual_BW.pdf - the G4-encoded bilevel pages that were replaced by
colour pages
Thanks
Antonio
--
Antonio Carlini
antonio at acarlini.com
More information about the cctalk
mailing list