Jim wrote:
That is one of the MISuses of PDF. PDF should not be
used as a container
for bitmap images.
Why? What better open-standard file format can store a lot of pages using
lossless bilevel compression? PDF can store the original bitmaps (well-
compresses) together with the OCR results, so that you can have
mostly-searchable files that still look like the original doument. (As
opposed to typical OCR files that are completely screwed up and lose
information.)
And PDF can support a mix of bilevel and greyscale or monochrome in
the same document, or even on the same page.
In case it wasn't obvious, PDF *is* Postscript!
It's *portable*
postscript.
Speaking as someone who has written software to read and write both
Postscrpt and PDF, I can tell you in no uncertain terms that PDF is
NOT Postscript. PDF happens to use a subset of the Postscript
imaging model, and has superficially similar syntax in some areas,
but that's about as close as they get.
The best format for mixed text and graphics that do
NOT need to be indexed
(ie converted to text) is DjVu ("DejaVu"). DjVu is a mixed-type format
that keeps a B&W image of the text and a color or grayscale image of
everything else in the same file, and each layer is compressed with a
method appropriate to their content type.
That's entirely possible with PDF as well. There are just relatively
few tools that can do it, just as there are few tools that can do it
in DjVu format.
Unfortunately, the best DjVu tools cost significant
money,
so it hasn't taken off.
Since PDF can do the same things, there seems to be little advantage
to using DjVu instead.
Eric