Scanning formats

15 Aug 2006

...
    But what if you use mutiple OCR programs?  Say,
three different OCR
 programs and then process the results taking a majority vote on each
 resulting character?  Even if all three mangle 20%, it won't always be the
 *same* 20% across all three, right?  (and if I did my math right, using
 three 80% accurate programs reduces the error rate to just 0.8% or a 99.2%
 accuracy rate) 
In the text OCR world, it's a mixed bag.  I've tried this some in my day
job at FedEx.  While you can get some improvement this way, I've found
a little different approach to help even more.  You start with a greyscale
image and do the binarizing yourself.  Then you run the OCR engine
on the image with several different binarization algorithms and different
parameters on each.  One of the commercial vendors does something
like this and gives it a fancy name, like virtual rescan or something like
that.  At least with the material and techniques I was using, I found
diminishing returns started to set in at after about 3 passes.

BLS

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Scanning formats