Message: 8
Date: Sat, 12 May 2007 11:49:02 +0100
From: Paul Williams <paul at frixxon.co.uk>
Subject: Re: Manuals being scanned
To: General Discussion: On-Topic and Off-Topic Posts
<cctalk at classiccmp.org>
Message-ID: <46459B9E.4040706 at frixxon.co.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Al Kossow wrote:
>
> They are some of the ugliest, bloated pdf's I've ever seen.
> What are you doing to them to make them look so BAD.
This is Adobe Capture at work. I had a look at the
CP/M manual to see if
the OCRed text was overlaid on an original full page image beneath, but
unfortunately, the 256-page document consists of 13441 image fragments
that couldn't be OCRed. There are no images where text has been OCRed.
--
Paul
Paul -
in all kindness (I've been there, done that)
Stop scanning a bit and consider:
1) your current technique is not good!
a) your current files are indeed much too large
by maybe a factor of 6 or 7
b) your current files contain OCR errors - with no warning
to users, who tend to think .pdf means a valid image.
Two easily found errors are on "your"
http://www.cse.uta.edu/TheMuseum at
CSE/Manuals%20Scanned/CDC/Control%20Data-Cyber%2070%20Computer%20Systems%20Models%2072,73,74,6000%20Computer%20Systems.PDF
on page 3-7 or page 19 in the file
in HTML, several variable which should be
l<sub>i</SUB>
have been mis OCRed and are shown as
I,
Adobe blunders such as the above caused me to abandon
Adobe for several years, until I met an Adobe employee
who straightened me out.
I now use Adobe Acrobat Professional as scanning input control,
(not optimum, but it works OK)
- trim off the images of paper holes with "Crop Pages"
- check the "Recognize Text using OCR"
which places Adobe's interpretation "behind" the image
(for direct searching)
- and have figured how to make "Reduce File Size" work
Another thing to consider is that GOOGLE OCRs .pdf
files that it spiders, highlighting search hits :-))
To some extent, your Adobe OCR is bypassed by GOOGLE.
So - I'm back "PDF"ing,
http://www.ed-thelen.org/comp-hist/on-line-docs.html
http://www.ed-thelen.org/#h-documents
etc.
For serious work, I suggest you look at Al Kossow's suggestions at
http://www.bitsavers.org/
"Keep Smiling" ;-))
Ed Thelen