Although it doesn't really know text is per-se,
one of its
 algorithms is
 to find glyph-like things.  Once it has all glyph-like things
 isolated
 on a page, it compares them all to each other and if two glyphs are
 similar enough, it will just represent them both (or N of
 them) with one
 compressed glyph image. 
That looks like information loss to me. If one of those glyph-like
things was not the same symbol as the others, then the algorithm
has just introduced an error.
  So for OCR purposes, I don't think this type of
compression
 really hurts
 -- it replaces one plausible "e" image with another one. 
But one of them might have been something other than an "e".
Antonio
--
---------------
Antonio Carlini arcarlini(a)iee.org