On 17 January 2012 08:50, Camiel Vanderhoeven <iamcamiel at gmail.com> wrote:
Hi Everyone,
I have a bunch of PDF files that contain the microcode listings for an IBM
7201-02 CE (enhanced system/360 model 65), like this one:
http://ibm360-console.wikispaces.com/file/view/QZ001.pdf. I need their
contents for the emulator that drives my '65 control panel. Unfortunately,
the OCR software I have tries to recognize English words, and makes
gibberish out of them. I'm only interested in the 1's and 0's, so it would
be wonderful if there was OCR software that you can tell only to look for
0's and 1's (or have some bias towards recognizing characters as a 1 or 0.
Is anyone here aware of such software, or can anyone recommend a program
that might do a good job with these?
I am not sure how helpful this answer will be but Tesseract
(originally a commercial HP product, now in Google Code) has training
files for different languages. I have never modified them but my
guess, from looking at the documentation, is that you could make
training files for a binary language.
N.