So, you're
fortunate if you can get 80% accuracy with an OCR engine.
But what if you use
mutiple OCR programs? Say, three different OCR
programs and then process the results taking a majority vote on each
resulting character? Even if all three mangle 20%, it won't always
be the *same* 20% across all three, right? (and if I did my math
right, using three 80% accurate programs reduces the error rate to
just 0.8% or a 99.2% accuracy rate)
That figure is right *if* those 20% errors are independent and randomly
distributed. I doubt that either part of that is so - they will all
tend to error on the dubious characters (FWVO "dubious"), if nothing
else.
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML mouse at rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B