I started answering offline but perhaps someone else is interested
too, so here's my answer for the list.
paul
From: Paul Koning <pkoning(a)equallogic.com>
To: rigdonj(a)cfl.rr.com
Date: Tue, 2 Mar 2004 09:27:52 -0500
Subject: Re: OCR of 644 TIFF files?
>>>> "Joe" == Joe R
<rigdonj(a)cfl.rr.com> writes:
Joe> Hi Paul, How much does it cost for Adobe's OCR download and what
Joe> versions of Acrobat does it work with?
Hi Joe... I was about to answer your query...
The OCR in Acrobat is a plugin; I downloaded it off the Adobe
website. It plugs into full (not reader-only) Acrobat, which I think
costs a bit over $100. I bought version 5; that version is out of
date at this point but I don't know which one is current.
Another point to watch for: I do not know if Adobe still offers that
plugin at no charge. The free version I have is limited to 50 pages
at a time -- but you can do larger documents by breaking them up into
50 page chunks and re-merging them, which isn't hard to do with
Acrobat. (Drag & drop does it.) There is also a no-limits version of
that plugin. But it's obviously aimed at lawyers and the like, people
with way too much money. I forgot the price but I think it was over
$1000!
The other issue with the plugin is that it produces a PDF file as
output, which is fine if that's what you want. But a PDF file is NOT
really editable, so if the results aren't quite what you want, it's
very painful to tweak it. I did it on a 60 page document. Acrobat
(full version) lets you edit a PDF file, for example to fix font
glitches, but ONLY ONE LINE AT A TIME!
A long time ago I used a "light" version of some commercial PC OCR
program. I don't remember the name; perhaps I'll think of it later or
find it somewhere. Pagesomething... That was ok, more flexible since
it produces output that Word can use, and paragraphs actually come
across as paragraphs rather than as individual lines. I don't
remember how it did with mixed documents -- graphics plus text. The
nice thing about the Adobe OCR is that it handles mixed pages well.
As for scanning -- some memory says that you can scan in Acrobat. At
this point I forgot how I did the original scan of the document I
processed. As I said, that was about 60 pages (the original Ethernet
spec, as a matter of fact). I may have scanned it page by page in
Photoshop, then dragged/dropped the TIF files into Acrobat.
Come to think of it, I also did OCR on a significantly larger
document: a PDF file of page scans of a flight manual. I had Acrobat
export the pages (result: 300 TIF files), ran them through Photoshop
for contrast, clipping off crud at the edges, etc., then imported them
back into Acrobat into a new document, which I then fed to the OCR
machine. I now have the same manual with its text looking somewhat
ratty, but fully searchable and much smaller. Ratty, because the OCR
doesn't always identify the font correctly, so you tend to have fonts
and type sizes mixed in a line when they weren't in the original.
paul