OCR software

1 Sep 2006

Al Kossow wrote:
...
      HP
developed an OCR engine called Tesseract that is supposed to be
 pretty good.  They released it to the open-source world, and Google has
 picked it up and started working on it. 
 classiccmp list member James Markevitch has been working on an OCR program
 as well, optimized for column formated input, like listings. 
Cross-platform, or one specific OS?
I started putting some stuff together to allow a user to graphically describe
a scanned page (so you'd roughly mark out what were images, what were columns
of text etc.) prior to feeding to an OCR engine, as experience of commercial
products has been that they tend to get it wrong too much to be left to run
without user input. Unfortunately the Linux OCR engines available proved to be
just too poor in quality to make it worthwhile, so I shelved it until
something better came along - maybe Tesseract will do the job.
...
  I was just talking to Doron Swade (the person
responsible for the Difference
 Engine at the British Science Museum) and he is interested in OCR of
 mathematical tables (also column-oriented like listings). 
I've never actually met Doron, although his name tends to crop up an awful
lot. I think he's possibly up at our museum next Friday, but I'll be on a
plane at that point...
cheers
Jules

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

OCR software