At 01:15 AM 12/30/98 -0800, The Sam Ismail wrote:
The OCR is OK when the text is just normal, and does remarkably well. But
I need an OCR suite smarter than Xerox's TextBridge Classic. I also need
some good post-processing software, or at least need to know how to scan a
simple black & white document without the scanner introducing blotches and
crap. Any suggestions?
I've used Caere OmniPage in the past, and it seemed pretty good, but
I wasn't trying to scan old computer docs, just nice typewriter pages.
I'm very interested in the collective wisdom about this, so of course
it seems quite on-topic to me. I'd like to scan the ASR-33 Teletype
manuals, which contain plenty of odd hand-set type, drawings, off-size
pages, schematics, etc. I'd also like to restore the UCSD Pascal
manuals, of which I've heard the only electronic copies at UCSD were
lost a long time ago.
Given these problems of line art and odd character sets, I suspect
the most useful first step would be to scan all docs at a given
resolution, then store them as bitmaps in a format most easily
loaded into any present or future OCR / PDF-ish program. Someone
mentioned the multi-page TIFF format. As for which resolution,
I think 300 DPI might be too coarse.
I like Doug's idea of shooting for HTML. I recall the multi-res
buttons on IBM's patent server, which allows you an easy way
to browse thumbnails, then zoom in on the desired page at various
resolutions. Is there an off-the-shelf tool for doing this?
- John