It was thus said that the Great Chuck Guzis once
stated:
So, you're fortunate if you can get 80%
accuracy with an OCR engine.
Obviously, the manual cleanup process is very time-intensive. But it's all
we've got in the music world.
But what if you use mutiple OCR programs? Say, three different OCR
programs and then process the results taking a majority vote on each
resulting character? Even if all three mangle 20%, it won't always be the
*same* 20% across all three, right? (and if I did my math right, using
three 80% accurate programs reduces the error rate to just 0.8% or a 99.2%
accuracy rate)
I guess that depends on how the OCR program renders the *graphical*
representation of the music score. I.e. if it renders it in a
*cleaner* graphic format, then you've just traded one hard
problem for another. :-(
I assume the originals are hand-drawn (though no doubt on a
mechanically reproduced/ruled staff)?