Question about PDF manipulation

2 Jun 2005

On Thu, 2005-06-02 at 16:58 -0400, Paul Koning wrote:
...

>>>> "Jan-Benedict" == Jan-Benedict Glaw <jbglaw at
lug-owl.de> writes:  
 > Ghostscript reads PDF files every bit as well
as PS files, and
> it's open source...  
  Jan-Benedict> You didn't answer my question:-) Consider I prepare a
  Jan-Benedict> TIFF file that contains (with additional tags) eg. some
  Jan-Benedict> raw OCRed text, not read-checked. Now I preapre a PDF
  Jan-Benedict> from this and use gs to get the image back.  Is my text
  Jan-Benedict> still there? Or do I get an image that "looks" almost
  Jan-Benedict> the original, but doesn't contain my extra-data?

 Oh.  I didn't know TIFF could do that; I certainly would never store
 text in a TIFF file, no more than I would store images in a DOC file. 
It's a pretty cool format for that kind of thing; I suppose like HTML
there are a bare minimum of tags which any decoder should support, and
should be able to skip over anything it can't handle and still output an
image. 

It tends to be let down by bad decoder code though - in particular
decoders typically either can't handle multi-page images (or do so
badly), or they don't support all the common compression schemes out
there. 

cheers

Jules

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Question about PDF manipulation