Question about PDF manipulation

2 Jun 2005

On Thu, 2005-06-02 at 22:49 +0200, Jan-Benedict Glaw wrote:
...
  On Thu, 2005-06-02 16:39:58 -0400, Paul Koning
<pkoning at equallogic.com> wrote:

>>>> "Jan-Benedict" == Jan-Benedict Glaw <jbglaw at
lug-owl.de> writes:   Jan-Benedict> Do we actually *have* the tools? We've
got tumble to
  Jan-Benedict> assemble a PDF file, but do we have proper tools to
  Jan-Benedict> disassemble one? ...and I really mean exporting the
  Jan-Benedict> initial TIFF, not something that looks like it.

 Ghostscript reads PDF files every bit as well as PS files, and it's
 open source...   
 You didn't answer my question:-)  Consider I prepare a TIFF file that
 contains (with additional tags) eg. some raw OCRed text, not
 read-checked. Now I preapre a PDF from this and use gs to get the image
 back.  Is my text still there? Or do I get an image that "looks" almost
 the original, but doesn't contain my extra-data? 
Hmm, 'no' seems to be the answer. Or at least when I use ImageMagick
(which seems to call into ghostscript in the case of manipulating PDF
files) it's not preserving the TIFF comment field.

A did the following:

  - Created a small TIFF image with Gimp, and saved it with "this is a
comment" in the comment field.
  - Verified that the comment was in place using by running 'identify'
on the TIFF file.
  - Converted the single TIFF file to a PDF using ImageMagick's convert
utility (which calls into Ghostscript librairies AFAIK)
  - Converted the resulting PDF file back to a single TIFF image with
convert.
  - Ran identify again on the resulting TIFF file, and the comment's now
changed to: "Image generated by ESP Ghostscript (device=pnmraw)"

... so it looks like any TIFF 'metadata' isn't getting preserved. 

Looking at the PDF file, I'm not convinced there's any TIFF data in
there to be honest. It looks more like the image is re-encoded from the
input TIFF to PDFs own way of storing bitmap data - in other words it's
not simply a wrapper for a bunch of TIFF images, but merely a wrapper
for bitmap data in PDF's own format. That's something of a
disappointment; I always thought PDF just encapsulated the input images
rather than re-encoding in any way...

cheers

Jules

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Question about PDF manipulation