Although it doesn't really know text is per-se,
one of its
algorithms is
to find glyph-like things. Once it has all glyph-like things
isolated
on a page, it compares them all to each other and if two glyphs are
similar enough, it will just represent them both (or N of
them) with one
compressed glyph image.
That looks like information loss to me. If one of those glyph-like
things was not the same symbol as the others, then the algorithm
has just introduced an error.
So for OCR purposes, I don't think this type of
compression
really hurts
-- it replaces one plausible "e" image with another one.
But one of them might have been something other than an "e".
Antonio
--
---------------
Antonio Carlini arcarlini(a)iee.org