Scanning Suggestions (Bookmarks & Colour)
Astrid at xrtc.net
Sat Aug 28 03:21:06 CDT 2021
i've achieved satisfactory results paletteizing scans of low-color-depth material using a tool called 'noteshrink':
æstrid smith (she/her)
=<[ c y b e r ]>=
antique telephone collectors association member #4870
On Fri, Aug 27, 2021, at 13:50, Antonio Carlini via cctalk wrote:
> I have a few manuals to scan and I'm looking for suggestions, about how
> to add bookmarks and how to handle colour.
> Bookmarks should be easier, so lets start with that. I want to add
> bookmarks (or whatever they are called) so that it is easy to navigate
> to page "2-48" or "C-17" in a document. Many of the PDFs on bitsavers
> have that and I've found it very useful so I'd like to do that for my
> future scans. I've tried with pdftk (the Java port as the original is no
> longer available on my distro) but that failed. So I tried GhostScript
> and that also failed, while also rewriting the PDF to be considerably
> larger. Is there simple way to achieve this (ideally from the CLI)?
> Now for the scanning itself.
> For manuals that are simple monochrome, I plan to scan at 600dpi bilevel
> G4 encoded, wrapped in PDF.
> For photographs or shaded areas that don't necessarily come out well
> under those settings, I plan to use 8-bit greyscale. I'd prefer to use
> 600dpi but I may have to fall back to 300dpi if the per-page fiile size
> shoots up too much.
> The real issue is colour. I know that various people have looked at the
> issue of how to efficiently scan pages that are mostly black and white
> but have some coloured text (RSX-11 manuals and early VMS manuals did
> this to highlight terminal input, for example). I don't think this is a
> solved problem and I'm not expecting a solution, what I'm really looking
> for is to check that what I'm about to produce will have all the
> information that a future efficient algorithm is likely to need.
> I'm going to start by scanning the whole manual as though it had no
> colour (so 600 dpi bilevel G4 encoded, except for pages with photos and
> shading and so on). Then I'm going to go back and rescan the pages that
> have colour and scan those at 600 dpi and save as a JPG. Then I'll
> produce a final PDF with the colour pages inserted. I'll also produce a
> PDF with the B&W pages that were replaced by colour pages (I assume OCR
> will be better served by non-jaggy scans).
> So the final outputs will be:
> manual.pdf - the whole manual, including whole pages scanned as colour
> if any colour is present on them
> manual_BW.pdf - the G4-encoded bilevel pages that were replaced by
> colour pages
> Antonio Carlini
> antonio at acarlini.com
More information about the cctech