But what if you use mutiple OCR programs? Say,
three different OCR
programs and then process the results taking a majority vote on each
resulting character? Even if all three mangle 20%, it won't always be the
*same* 20% across all three, right? (and if I did my math right, using
three 80% accurate programs reduces the error rate to just 0.8% or a 99.2%
accuracy rate)
In the text OCR world, it's a mixed bag. I've tried this some in my day
job at FedEx. While you can get some improvement this way, I've found
a little different approach to help even more. You start with a greyscale
image and do the binarizing yourself. Then you run the OCR engine
on the image with several different binarization algorithms and different
parameters on each. One of the commercial vendors does something
like this and gives it a fancy name, like virtual rescan or something like
that. At least with the material and techniques I was using, I found
diminishing returns started to set in at after about 3 passes.
BLS