Scanning docs for bitsavers

Grant Taylor cctalk at
Mon Dec 2 20:08:38 CST 2019

On 12/2/19 5:34 PM, Guy Dunphy via cctalk wrote:

Interesting comments Guy.

I'm completely naive when it comes to scanning things for preservation. 
  Your comments do pass my naive understanding.

> But PDF literally cannot be used as a wrapper for the results, 
> since it doesn't incorporate the required image compression formats. 
> This is why I use things like html structuring, wrapped as either a zip 
> file or RARbook format. Because there is no other option at present. 
> There will be eventually. Just not yet. PDF has to be either greatly 
> extended, or replaced.

I *HATE* doing anything with PDFs other than reading them.  My opinion 
is that PDF is where information goes to die.  Creating the PDF was the 
last time that anything other than a human could use the information as 
a unit.  Now, in the future, it's all chopped up lines of text that may 
be in a nonsensical order.  I believe it will take humans (or something 
yet to be created with human like ability) to make sense of the content 
and recreate it in a new form for further consumption.

Have you done any looking at ePub?  My understanding is that they are a 
zip of a directory structure of HTML and associated files.  That sounds 
quite similar to what you're describing.

> And that's why I get upset when people physically destroy rare old 
> documents during or after scanning them currently. It happens so 
> frequently, that by the time we have a technically adequate document 
> coding scheme, a lot of old documents won't have any surviving 
> paper copies.  They'll be gone forever, with only really crap quality 
> scans surviving.

Fair enough.

Grant. . . .
unix || die

More information about the cctech mailing list