Linearizing PDF scans

Al Kossow aek at bitsavers.org
Sat Aug 14 13:04:10 CDT 2021


On 8/13/21 3:15 PM, J. David Bryan via cctech wrote:
> On Friday, August 13, 2021 at 17:23, Alexandre Souza wrote:
> 
>> Is any kind of standard, recomendation, group, mail list, to discuss
>> the subject?
> 
> I am not aware of any.  I started with Al Kossow's basic recommendations,
> modified slightly:
> 
>    - scan at 600 dpi
>    - use TIFF G4 where feasible
>    - use tumble to convert to PDF
> 
> I then wrote and use a couple of simple image-processing utilities based on
> the Leptonica image library:
> 
>    http://www.leptonica.org/
> 
> ...to clean up the scans (the library makes the programs pretty trivial).
> They start with the raw scans and:
> 
>    - mask the edges to remove hole punches, etc.
>    - size to exactly 8.5" x 11" (or larger, for fold-out pages)
>    - remove random noise dots (despeckle)
>    - rotate to straighten (deskew)
>    - descreen photos on pages into continuous-tone images
>    - quantize and solidify screened color areas into solid areas
>    - assign page numbers and bookmarks in the PDF
> 
> A good example PDF produced by these programs is:
> 
>    http://www.bitsavers.org/pdf/hp/64000/software/64500-90912_Mar-1986.pdf
> 
> The cover is a "solidified" black/gray/white image, manual pages 1-2 and
> 1-4 are continuous-tone JPEG images overlaying bilevel text images, and the
> rest of the pages are masked, deskewed, bilevel text images.  The PDF
> bookmarks and logical page numbers are auto-generated from the original
> scan filenames.
> 
> The final step is linearizing the PDFs, but I'm wondering whether this is
> still useful.
> 
>                                        -- Dave
> 

Jay Jager is trying to deal with scanning manuals with colored text and backgrounds.
Is your workflow for dealing with this around somewhere?



More information about the cctech mailing list