Converting Documents

9 Apr 2020

On 2020-04-09 10:16 AM, emanuel stiebler via cctalk wrote:
...
  Hi All,
 somebody scanned documents for me in .pdfs.
 Looking into them, they are pages of jpgs embedded in .pdf ..
 (100 pages resulting in 350MBytes ...)
 Any easy way to convert them into some b/w .pdf file?
 It is all text, no drawings ...
 Pointers?
 Thanks

Typically I extract using pdfimages
$ pdfimages
pdfimages version 4.00
Copyright 1996-2017 Glyph & Cog, LLC
Usage: pdfimages [options] <PDF-file> <image-root>
You can then use GraphicsMagick to threshold to bilevel (a suitable
threshold can be found by inspecting or histogramming the image e.g. in
Photoshop).
  gm mogrify -threshold XX% -monochrome
(or `gm convert` can convert each page to TIF for the next step)
Then I'd go via TIFF, combining and compressing all pages as G4
compression using `tiffcp -c g4`, then if you want a PDF instead of
multipage tiff, you can transcode to PDF with `tiff2pdf`.
tiffcp and tiff2pdf are libtiff utilities.
There might be a shortcut using different tools but those are the tools
I use.
--Toby

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Converting Documents