Eric Smith wrote:
Even at those resolutions, it can be difficult to tell
some characters
apart, expecially from poor quality originals. But usually I can do
it if I study the scanned page very closely. No, OCR today cannot do
as good a job at that as I can. Someday OCR may be better. But
arbitrarily replacing the glyphs with other ones the software considers
"good enough" is going to f*&# up any possibility of doing this by
either a human OR OCR.
And all to make the file a little smaller. DVD-R costs about $0.25
to store 4.7GB of data, so I just can't get excited about using lossy
encoding for text and line art pages that usually don't encode with
lossless G4 to more than 50K bytes per page.
I'm here completely with Eric. However, probably we should distinguish
how we actually scan the stuff, and how we distribute the scans.
As the most work is anyway in setting up the scanner, name the files,
check if all pages are there, etc. I don't like to do it twice, so I
scan at least 300-400 dpi, most of the time with two bits per dot/pixel.
And put it on DLT as an original, then play with what I got. And, every
few years, I even check if the OCR is good enough, or still not, ...
Cheers