Better indexing on bitsavers

19 May 2005

From: "Al Kossow" <aek at bitsavers.org>
Sent: Thursday, May 19, 2005 4:12 PM
...
  On a separate post I mentioned cross support/cross
linking.  It was my
 clumsy way of saying indexing.  It would be nice if people pitched in and
 just did it.  I may make a list of all of the chips listed in the
 individual
 PDF's that Al has posted for the westerndigital datasheets.  If Al then
 posts this index with the PDF's (or creates an index folder) so the
 googlebot can scan it then a google search would point you where to get
 it.
 Simple with the task easily shared amoung many people.
 --
 That would be a wonderful thing. I have a HUGE backlog of scanned databook
 material, and just finished picking up almost 40 book boxes
 of 70's -> 90's data books from a third large collection.
 The first was from the databook collection of Haltek Electronics (RIP),
 the second from a private collection that was given to us with the promise
 that it would be scanned, and now this addional one.
 (I've found a few interesting things in the last lot already.. A copy
  of the Fairchild '69 data book, a book by Gnostic Concepts on early
  70's memory technology, and two of the classic error correcting codes
  books)
 There is no way I'm going to have time to OCR or index this. A simple
 text file per PDF with part number and page number would be wonderful.
 This is also the sort of data that Google seems to index REALLY well.
 Watching the hits on bitsavers, almost everyone finds the archive by
 stumbling upon the 'whatsnew.txt' or 'Index.txt' files.
 I'd be interested in suggestions for what books should be higher on the
 post-processing queue too. I probably have 50 databooks scanned but not
 PDFed right now. I've been concentrating mostly on getting the classic
 early stuff done first (2nd Edition TI TTL Data Book, etc.) 
One extra caveat would be when listing page numbers both the printed page
numbers and the PDF's declared page numbers should be included.
As I said when someone downloads a copy to look for something when there is
no index, just do it and send it back to Al.  By sharing the load no one has
to "do it all right now by yourself".
I highly recommend a file that could be concatenated into a "master index"
so it should have a PDF filename plus page numbers.  Each index should be
named the same as the PDF with a different extension and the field widths
should be standardized.
Al can decide if it should be text only, html, excel, etc.  If there is a
vote I vote text only either comma delimited (with quotes when needed) or
fixed width fields that would be easier to read.
Randy
www.s100-manuals.com

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Better indexing on bitsavers