Better indexing on bitsavers

19 May 2005

...
 > Doesn't this sort of imply that PDF is the
wrong choice of format for
> jobs like these? PDF is terrible way to package the documents. 
...
 It's just better than any other practical method
;-) 
...
 So many talk about ASCII being the only "right
way", as Al can attest to
time and accuracy makes image oriented PDF's the way to go. 
I'm pragmatic. I have hundreds of thousands (probably millions now..) of
pages of paper. I wanted easy access to all of it. Eric also had a lot of
stuff that I didn't have. The machines I use are Macs and Linux boxes.
There are PDF viewers for both of those platforms that had page indexing.
The level of PDF used is the minimum to support wrapping collections of
scans together with very simple metadata (the page number). That's all
of PDF that I use. The overhead is minimal, it can be read across most
computers currently in use, and if it's working with the right sort of
browsers will only transfer the page being viewed instead of the whole
document.
If something different comes around, the PDF spec is public, and by using
such a small subset it should be simple to translate.
I've never found much use for the fancier scripting stuff in tumble. It
takes long enough to just do the minimal post-processing of the scans that
I do now to think about writing a script for each document to do any sort
of fancier indexing.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Better indexing on bitsavers