Tom Gardner wrote:
Try ABBYY FineReader,
http://finereader.abbyy.com/ or
http://www.abbyyusa.com/adx/aspx/adxGetMedia.aspx?DocID=2314
I expect the US$50 Express version will do what u want, I have been a very
happy user of the PRO version for many years.
Tom
I second this recommendation. ABBYY FR is a great product and works
fantastically. I don't like many of the auto-modes though, and prefer
to select each region of every page manually. This involves clicking a
tool, highlighting the area of interest --- and telling abbyy whether it
is a graphic or a text box. This process is mind-numbing, but
gratifying once it is finished.
The OCR is very good, but depending on the type of text, you'll still
get sort of 1-5% error. This doesn't sound like alot, but ends up being
4 or 5 corrections per page. Some pages are perfect, and don't require
any correction. Proofreading the OCR'd text is absolutely required for
a solid finished product.
You can almost always tell why abbyy didn't read something --- there is
usually a noticeable defect in the original text. There is occasionally
"O" for the number zero, and "l" for the number 1 confusion. As you
might expect. If you don't tell it (or it didn't figure it out) that a
portion of the page is a graphic, it will try to OCR the graphic with
worthless although interesting results. :)
Abbyy suggests 400dpi minimum, but I usually use 600dpi. I've found
that I get pretty good results with 600. There's definitely more errors
under 300. I scan to TIFF, of course. :)
I've had no luck "training" abbyy to learn the text --- it seems like a
never ending challenge, and doesn't improve recognition much that I can
tell.
Regarding Adobe doing the OCR, their OCR is decent, but no where near
abbyy. Adobe OCR is good for "keeping-the-original-scanned-graphic" and
make it searchable. But if you entirely recreate the original document
from text (simple as unchecking a box or something),
you'll notice the
errors are simply way too much. The tools for repairing the
PDFs are
pretty arcane, and Adobe is a just a pain in the butt for this type of
stuff.
While not perfect, this is an example PDF made from abbyy.
http://www.techtravels.org/tech/BrianInstrumentsManuals.pdf
Hope this helps.
Keith