In article <38547.195.212.29.92.1138178440.squirrel at mail.gjcp.net>,
gordonjcp at
gjcp.net writes:
OCR is the hard part and I've yet to hear of
anything that is
even close to remotely acceptable. At say 1000 words/page
a success rate of 99.9% still leaves you with one fix up
per page. That's a good chunk of work for even a small manual
(say 200 pages). It's a lot of work for an RT-11 manual set
or similar!
Sounds like an ideal thing for a distributed project.
Give everyone who registers a few pages to proof read, combine into
finished work. If you wanted cross-checking you'd just make sure that
different people got different batches at different times, and diff the
results.
Wouldn't this require that everyone have a copy of Acrobat Capture?
No, just scans of the pages and the OCRed text.
One person with Acrobat Capture (or some other scanning and OCR package)
would get it into soft copy, and then bundles of text (say a dozen pages
each) would be sent to contributors.
Gordon.