On Thu, 2005-06-02 at 22:49 +0200, Jan-Benedict Glaw wrote:
On Thu, 2005-06-02 16:39:58 -0400, Paul Koning
<pkoning at equallogic.com> wrote:
>>>> "Jan-Benedict" == Jan-Benedict Glaw <jbglaw at
lug-owl.de> writes:
Jan-Benedict> Do we actually *have* the tools? We've
got tumble to
Jan-Benedict> assemble a PDF file, but do we have proper tools to
Jan-Benedict> disassemble one? ...and I really mean exporting the
Jan-Benedict> initial TIFF, not something that looks like it.
Ghostscript reads PDF files every bit as well as PS files, and it's
open source...
You didn't answer my question:-) Consider I prepare a TIFF file that
contains (with additional tags) eg. some raw OCRed text, not
read-checked. Now I preapre a PDF from this and use gs to get the image
back. Is my text still there? Or do I get an image that "looks" almost
the original, but doesn't contain my extra-data?
Hmm, 'no' seems to be the answer. Or at least when I use ImageMagick
(which seems to call into ghostscript in the case of manipulating PDF
files) it's not preserving the TIFF comment field.
A did the following:
- Created a small TIFF image with Gimp, and saved it with "this is a
comment" in the comment field.
- Verified that the comment was in place using by running 'identify'
on the TIFF file.
- Converted the single TIFF file to a PDF using ImageMagick's convert
utility (which calls into Ghostscript librairies AFAIK)
- Converted the resulting PDF file back to a single TIFF image with
convert.
- Ran identify again on the resulting TIFF file, and the comment's now
changed to: "Image generated by ESP Ghostscript (device=pnmraw)"
... so it looks like any TIFF 'metadata' isn't getting preserved.
Looking at the PDF file, I'm not convinced there's any TIFF data in
there to be honest. It looks more like the image is re-encoded from the
input TIFF to PDFs own way of storing bitmap data - in other words it's
not simply a wrapper for a bunch of TIFF images, but merely a wrapper
for bitmap data in PDF's own format. That's something of a
disappointment; I always thought PDF just encapsulated the input images
rather than re-encoding in any way...
cheers
Jules