jim s wrote:
we were able to locate arrays like this using an
autocorrelation function.
In python, which was simple read each page into a binary array. You
would have a sample 1 or 0 shape per what Alexandre suggests as well.
The autocorrelation function matches the shapes essentially.
I don't know if you even need to do that, depending on the font style - the
ratio of dark to light pixels in a given 'cell' might be enough to
determine if it's a 1 or 0. This sort of approach does assume that the
data's in a nice enough format (e.g. fixed width font, corrected for any
page distortion, little "outside" contamination) to work, though.
Of course it also doesn't help the OP, who was looking for an existing
program to do it ;)
(And I don't know if any OCR process can be fully trusted, so it still
needs some form of human validation, or retention of the original scans
alongside the OCR data)
cheers
Jules