Linearizing PDF scans

ben bfranchuk at jetnet.ab.ca
Sat Aug 14 13:22:36 CDT 2021


On 2021-08-14 12:04 p.m., Al Kossow via cctalk wrote:
> On 8/13/21 3:15 PM, J. David Bryan via cctech wrote:
>> On Friday, August 13, 2021 at 17:23, Alexandre Souza wrote:
>>
>>> Is any kind of standard, recomendation, group, mail list, to discuss
>>> the subject?
>>
>> I am not aware of any.  I started with Al Kossow's basic recommendations,
>> modified slightly:
>>
>>    - scan at 600 dpi
>>    - use TIFF G4 where feasible
>>    - use tumble to convert to PDF
>>
>> I then wrote and use a couple of simple image-processing utilities 
>> based on
>> the Leptonica image library:
>>
>>    http://www.leptonica.org/
>>
>> ...to clean up the scans (the library makes the programs pretty trivial).
>> They start with the raw scans and:
>>
>>    - mask the edges to remove hole punches, etc.
>>    - size to exactly 8.5" x 11" (or larger, for fold-out pages)
>>    - remove random noise dots (despeckle)
>>    - rotate to straighten (deskew)
>>    - descreen photos on pages into continuous-tone images
>>    - quantize and solidify screened color areas into solid areas
>>    - assign page numbers and bookmarks in the PDF
>>
>> A good example PDF produced by these programs is:
>>
>>    
>> http://www.bitsavers.org/pdf/hp/64000/software/64500-90912_Mar-1986.pdf
>>
>> The cover is a "solidified" black/gray/white image, manual pages 1-2 and
>> 1-4 are continuous-tone JPEG images overlaying bilevel text images, 
>> and the
>> rest of the pages are masked, deskewed, bilevel text images.  The PDF
>> bookmarks and logical page numbers are auto-generated from the original
>> scan filenames.
>>
>> The final step is linearizing the PDFs, but I'm wondering whether this is
>> still useful.
>>
>>                                        -- Dave
I tend to have my PDF's on portable device, so PDF's need to be easy to 
use on those devices.
Ben.


More information about the cctech mailing list