Folks on this list have expressed strong opinions in the past
on the subject of how to best OCR old important documents.
Recent donations to the DECUS archives here have resulted in
several important abstract books that I want to link electronically
(i.e. by HTTP) to the files already rescued from magnetic or
punched paper media. (See, for example,
http://pdp-10.trailing-edge.com/www/lib10/index.html
for the TOPS-10 abstracts linked to the files from DECUS library
tapes.)
I've spent the past weekend playing with scanners and OCR on PC-clones
running various Microsoft OS's. And, of course, I'm extremely unhappy
with the point-and-drool misery of doing all of this - there's
no reason I should have to click with the mouse for every single page
I need converted.
So the question: does anyone have recommendations (preferably for
freeware, though I would be willing to spend a few hundred dollars for
good tools, too) for software that will process already-scanned
.GIF's/TIFF's/bitmaps through the OCR process unattended? Ideally
there'd be a command line interface, something like
ocr page*.gif > bigoutputfile.txt
and ideally it would run under Linux as well, though I wouldn't complain
too strongly if someone recommended a Windows solution.
If it does run under Linux, it'd be *very* nice for batch processing
if it didn't need X11 and mouse-point-and-drool interfacing.
Tim. (shoppa(a)trailing-edge.com)