Scanning docs for bitsavers

Alexandre Souza alexandre.tabajara at gmail.com
Mon Dec 2 21:20:15 CST 2019


I cannot understand your problems with PDF files.
I've created lots and lots of PDFs, with treated and untreated scanned
material. All of them are very readable and in use for years. Of course,
garbage in, garbage out. I take the utmost care in my scans to have good
enough source files, so I can create great PDFs.

Of course, Guy's commens are very informative and I'll learn more from it.
But I still believe in good preservation using PDF files. FOR ME it is the
best we have in encapsulating info. Forget HTMLs.

Please, take a look at this PDF, and tell me: Isn't that good enough for
preservation/use?
https://drive.google.com/file/d/0B7yahi4JC3juSVVkOEhwRWdUR1E/view

Thanks
Alexandre

---8<---Corte aqui---8<---
http://www.tabajara-labs.blogspot.com
http://www.tabalabs.com.br
---8<---Corte aqui---8<---


Em ter., 3 de dez. de 2019 às 00:08, Grant Taylor via cctalk <
cctalk at classiccmp.org> escreveu:

> On 12/2/19 5:34 PM, Guy Dunphy via cctalk wrote:
>
> Interesting comments Guy.
>
> I'm completely naive when it comes to scanning things for preservation.
>   Your comments do pass my naive understanding.
>
> > But PDF literally cannot be used as a wrapper for the results,
> > since it doesn't incorporate the required image compression formats.
> > This is why I use things like html structuring, wrapped as either a zip
> > file or RARbook format. Because there is no other option at present.
> > There will be eventually. Just not yet. PDF has to be either greatly
> > extended, or replaced.
>
> I *HATE* doing anything with PDFs other than reading them.  My opinion
> is that PDF is where information goes to die.  Creating the PDF was the
> last time that anything other than a human could use the information as
> a unit.  Now, in the future, it's all chopped up lines of text that may
> be in a nonsensical order.  I believe it will take humans (or something
> yet to be created with human like ability) to make sense of the content
> and recreate it in a new form for further consumption.
>
> Have you done any looking at ePub?  My understanding is that they are a
> zip of a directory structure of HTML and associated files.  That sounds
> quite similar to what you're describing.
>
> > And that's why I get upset when people physically destroy rare old
> > documents during or after scanning them currently. It happens so
> > frequently, that by the time we have a technically adequate document
> > coding scheme, a lot of old documents won't have any surviving
> > paper copies.  They'll be gone forever, with only really crap quality
> > scans surviving.
>
> Fair enough.
>
>
>
> --
> Grant. . . .
> unix || die
>


More information about the cctech mailing list