> Doesn't this sort of imply that PDF is the
wrong choice of format for
> jobs like these? PDF is terrible way to package the documents.
It's just better than any other practical method
;-)
So many talk about ASCII being the only "right
way", as Al can attest to
time and accuracy makes image oriented PDF's the way to go.
I'm pragmatic. I have hundreds of thousands (probably millions now..) of
pages of paper. I wanted easy access to all of it. Eric also had a lot of
stuff that I didn't have. The machines I use are Macs and Linux boxes.
There are PDF viewers for both of those platforms that had page indexing.
The level of PDF used is the minimum to support wrapping collections of
scans together with very simple metadata (the page number). That's all
of PDF that I use. The overhead is minimal, it can be read across most
computers currently in use, and if it's working with the right sort of
browsers will only transfer the page being viewed instead of the whole
document.
If something different comes around, the PDF spec is public, and by using
such a small subset it should be simple to translate.
I've never found much use for the fancier scripting stuff in tumble. It
takes long enough to just do the minimal post-processing of the scans that
I do now to think about writing a script for each document to do any sort
of fancier indexing.