Many things

31 Jan 2005

...
  Although it doesn't really know text is per-se,
one of its
 algorithms is
 to find glyph-like things.  Once it has all glyph-like things
 isolated
 on a page, it compares them all to each other and if two glyphs are
 similar enough, it will just represent them both (or N of
 them) with one
 compressed glyph image. 
That looks like information loss to me. If one of those glyph-like
things was not the same symbol as the others, then the algorithm
has just introduced an error.
...
  So for OCR purposes, I don't think this type of
compression
 really hurts
 -- it replaces one plausible "e" image with another one. 
But one of them might have been something other than an "e".
Antonio
--
---------------
Antonio Carlini arcarlini(a)iee.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Many things