Better indexing on bitsavers

19 May 2005

On Thu, 2005-05-19 at 16:02 -0700, Al Kossow wrote:
...
  >
Doesn't this sort of imply that PDF is the wrong choice of format for
> jobs like these? PDF is terrible way to package the documents. 
 It's just better than any other practical
method ;-) 
 So many talk about ASCII being the only
"right way", as Al can attest to
time and accuracy makes image oriented PDF's the way to go. 
 I'm pragmatic. I have hundreds of thousands (probably millions now..) of
 pages of paper. I wanted easy access to all of it. Eric also had a lot of
 stuff that I didn't have. The machines I use are Macs and Linux boxes.
 There are PDF viewers for both of those platforms that had page indexing. 
Sure... I'm not sure what it is about PDF that bugs me (I'm not that
much of a fan of it even for textual stuff TBH!). In the case of doc
scans it just feels a bit odd downloading a bunch of images wrapped in a
proprietary format I suppose, when there are far nicer tools for dealing
with images than there are PDF viewers for handling PDF content (and
because the PDF files are just full of images, there's no other
metadata).
I tend to 'explode' any PDF files of scans (from whatever source) here
once downloaded into their own directory; I just find it easier to
manipulate via whatever image tool is most suitable for whatever I'm
doing at the time, rather than being stuck with a PDF viewer. I suppose
if I wanted to add metadata to that, I'd include an ASCII text file in
the directory full of images with the relevant info in (I've done that
with ROM and Disk images many a time; not needed to do it with Doc scans
yet*)
* For single page magazine ad scans I've tended to enlarge the image
canvas a little at the top or bottom with a blank area and included
there details about what magazine it was from, date etc. - there are
likely much better ways (and that obviously doesn't scale to thousands
of docs!) but it's been better than nothing up until now.
Maybe I'm atypical in usage :-) I'll rarely want to download scans and
*not* keep a copy on local storage just in case, so I've never used the
"view a PDF file in a web browser" side of things.
...
  If something different comes around, the PDF spec is
public, and by using
 such a small subset it should be simple to translate. 
Yep true... plenty of tools already exist to pull PDF files apart. Well,
you'll be converting all your bitsavers content to futurekeep format
soon :-)
cheers
Jules

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Better indexing on bitsavers