On Thu, 2005-06-02 at 16:58 -0400, Paul Koning wrote:
>>>> "Jan-Benedict" == Jan-Benedict Glaw <jbglaw at
lug-owl.de> writes:
> Ghostscript reads PDF files every bit as well
as PS files, and
> it's open source...
Jan-Benedict> You didn't answer my question:-) Consider I prepare a
Jan-Benedict> TIFF file that contains (with additional tags) eg. some
Jan-Benedict> raw OCRed text, not read-checked. Now I preapre a PDF
Jan-Benedict> from this and use gs to get the image back. Is my text
Jan-Benedict> still there? Or do I get an image that "looks" almost
Jan-Benedict> the original, but doesn't contain my extra-data?
Oh. I didn't know TIFF could do that; I certainly would never store
text in a TIFF file, no more than I would store images in a DOC file.
It's a pretty cool format for that kind of thing; I suppose like HTML
there are a bare minimum of tags which any decoder should support, and
should be able to skip over anything it can't handle and still output an
image.
It tends to be let down by bad decoder code though - in particular
decoders typically either can't handle multi-page images (or do so
badly), or they don't support all the common compression schemes out
there.
cheers
Jules