On May 13, 2007, at 6:44 PM, Richard wrote:
Umm. Have
you ever googled for something and had it turn up PDF
files? I'd say fully a third of them that I run across contain
images of pages, not text.
How do you know it doesn't contain text?
Because in several instances in which I've downloaded the
referenced PDF file and torn it apart (to re-compress, trim out cover
pages that weren't present in the original document, etc) they've
contained only images.
As described earlier in this
thread, the PDF file can be constructed with a "hidden" text layer
that is OCR'ed from the images so that text searches work.
Yup. That's a damn nice feature of the PDF format if you ask me.
I bet that
Google just searches embedded text because they never find stuff on
bitsavers when you search for text.
I'm sure it does that too.
-Dave
--
Dave McGuire
Port Charlotte, FL