One extra caveat would be when listing page numbers
both the printed
page numbers and the PDF's declared page numbers should be included.
When I started building bitsavers, I didn't have an easy way to put the
page numbers on each page in the format they were in the original. Once
Eric had tumble working, I started bookmarking every page with the
'real' page number with the intent of fixing the PDF page to match.
Al can decide if it should be text only, html, excel,
etc.
ASCII space separated fields
Actually, I wasn't totally out of computing since January...
I built a 400+GB music archive at the radio station that I
do a weekly show at, and built the master indexes as text
files. They are post-processed into SQL form, but the original
data is all easily readable and editable.
It is really, really easy to manipluate the data later if you
do this.