> [...] But first, I'd need to understand PDF
(whose specification
> actually is about 8cm thick...)
1236 pages, according to GhostScript run on the PDF for the 1.6 spec.
Doesn't this sort of imply that PDF is the wrong
choice of format for
jobs like these?
Yes, if it needed further implying.
(plus I'm pissed at Adobe because their current
readr for Linux eats
close on 100MB of disk space just to let me read a PDF file :-)
So don't use Adobe's reader. Ghostscript generally does a pretty good
job in my experience.
It might be good for text-based documents (offering
text searching
and the like),
...though still not as good as a plain text file...
but is it necessarily the right thing for collections
of page scans?
"The" right hing? I dunno, but I do know I'd sure rather have a
directory containing TIFF files than a PDF containing the same TIFF
files. Or a tar of them, or almost anything else non-proprietary - the
less information I have to understand to figure out how to pull it
apart, the better, on which score PDF (as pointed out above) fails
rather dismally. (It may be an open format in that the doc is
unrestricted - though I'm not sure of even that much - but for an open
format, its doc is not easy to find. Google finds
http://www.adobe.com/products/acrobat/adobepdf.html, which is full of
puffery and notably short on links to fetch the spec itself from; I
eventually managed to scare it up - and it's a PDF. "Here! If you can
figure this file out, it will tell you how to figure it out!" As if
that weren't enough, Adobe prohibits saving a copy of it(!!); the
second page includes [ten-finger copy] "No part of this publication
(whether in hardcopy or electronic form) may be reproduced, stored in a
retrieval system, or transmitted, in any form, or by any means,
electronic, mechanical, photocopying, recording, or otherwise, without
the prior written permission of Adobe Systems Incorporated.".)
Writing one's own tar unpacker is a contemplatable exercise - a fairly
easy one, even, for a reasonably experienced programmer. Writing a PDF
groveler...is not.
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML mouse at rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B