Better indexing on bitsavers

19 May 2005

From: "Jules Richardson" <julesrichardsonuk at yahoo.co.uk>
Sent: Thursday, May 19, 2005 5:20 PM
...
  On Thu, 2005-05-19 at 23:46 +0200, Jan-Benedict Glaw
wrote:
  On Thu, 2005-05-19 16:32:46 -0500, Randy
McLaughlin <cctalk at randy482.com>
 wrote:
  One extra caveat would be when listing page
numbers both the printed
 page
 numbers and the PDF's declared page numbers should be included. 
 Some time ago, I worked on some TeX skeleton (generated with script's
 aid) to produce a PDF file with nice bookmarks and all the like.
 However, I came to the conclusion that this isn't a real solution.
 I'm still thinking about how paper-based documentation can be made up
 cleverly enough to gain text as well as images and mixing meta-data into
 that. Maybe I'd do some C programming and hack something nice producing
 PDF files helding everything? But first, I'd need to understand PDF
 (whose specification actually is about 8cm thick...) 
 Doesn't this sort of imply that PDF is the wrong choice of format for
 jobs like these? (plus I'm pissed at Adobe because their current readr
 for Linux eats close on 100MB of disk space just to let me read a PDF
 file :-)
 It might be good for text-based documents (offering text searching and
 the like), but is it necessarily the right thing for collections of page
 scans?
 cheers
 J. 
PDF is terrible way to package the documents.  It's just better than any
other practical method ;-)
So many talk about ASCII being the only "right way", as Al can attest to
time and accuracy makes image oriented PDF's the way to go.
Finding errors in OCR'ed files is extremely time consuming.
Randy
www.s100-manuals.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Better indexing on bitsavers