OCR software

List overview All Threads
Download

newer

older

age cutoff

Pentium for Non-PC (Was Statement...

aek＠bitsavers.org

1 Sep 2006 1 Sep '06

11:45 a.m.

...

HP developed an OCR engine called Tesseract that is supposed to be pretty good. They released it to the open-source world, and Google has picked it up and started working on it.

classiccmp list member James Markevitch has been working on an OCR program as well, optimized for column formated input, like listings. I was just talking to Doron Swade (the person responsible for the Difference Engine at the British Science Museum) and he is interested in OCR of mathematical tables (also column-oriented like listings).

Show replies by date

julesrichardsonuk＠yahoo.co.uk

1 Sep 1 Sep

1:05 p.m.

Al Kossow wrote:

...

HP developed an OCR engine called Tesseract that is supposed to be pretty good. They released it to the open-source world, and Google has picked it up and started working on it.

classiccmp list member James Markevitch has been working on an OCR program as well, optimized for column formated input, like listings.

Cross-platform, or one specific OS? I started putting some stuff together to allow a user to graphically describe a scanned page (so you'd roughly mark out what were images, what were columns of text etc.) prior to feeding to an OCR engine, as experience of commercial products has been that they tend to get it wrong too much to be left to run without user input. Unfortunately the Linux OCR engines available proved to be just too poor in quality to make it worthwhile, so I shelved it until something better came along - maybe Tesseract will do the job.

...

I was just talking to Doron Swade (the person responsible for the Difference Engine at the British Science Museum) and he is interested in OCR of mathematical tables (also column-oriented like listings).

I've never actually met Doron, although his name tends to crop up an awful lot. I think he's possibly up at our museum next Friday, but I'll be on a plane at that point... cheers Jules

mcguire＠neurotica.com

2 Sep 2 Sep

12:49 p.m.

On Sep 1, 2006, at 4:05 PM, Jules Richardson wrote:

...

HP developed an OCR engine called Tesseract that is supposed to be pretty good. They released it to the open-source world, and Google has picked it up and started working on it.

classiccmp list member James Markevitch has been working on an OCR program as well, optimized for column formated input, like listings.

Cross-platform, or one specific OS?

At first glance, it appears to be Linux-specific, but that's generally pretty easy to un-do. The important part is it's not Windoze software.

...

I started putting some stuff together to allow a user to graphically describe a scanned page (so you'd roughly mark out what were images, what were columns of text etc.) prior to feeding to an OCR engine, as experience of commercial products has been that they tend to get it wrong too much to be left to run without user input. Unfortunately the Linux OCR engines available proved to be just too poor in quality to make it worthwhile, so I shelved it until something better came along - maybe Tesseract will do the job.

It's possible...might be worth looking into. -Dave -- Dave McGuire Cape Coral, FL

7159

days inactive

7160

days old

test-drb@ccmp.vtda.org

Manage subscription

2 comments

3 participants

tags (0)

participants (3)

aek＠bitsavers.org
julesrichardsonuk＠yahoo.co.uk
mcguire＠neurotica.com