Scanning docs for bitsavers

Grant Taylor cctalk at gtaylor.tnetconsulting.net
Mon Dec 2 22:12:16 CST 2019


On 12/2/19 9:06 PM, Grant Taylor via cctalk wrote:
> In my opinion, PDFs are the last place that computer usable data goes. 
> Because getting anything out of a PDF as a data source is next to 
> impossible.
> 
> Sure, you, a human, can read it and consume the data.
> 
> Try importing a simple table from a PDF and working with the data in 
> something like a spreadsheet.  You can't do it.  The raw data is there. 
>  But you can't readily use it.
> 
> This is why I say that a PDF is the end of the line for data.
> 
> I view it as effectively impossible to take data out of a PDF and do 
> anything with it without first needing to reconstitute it before I can 
> use it.

I'll add this:

PDF is a decent page layout format.  But trying to view the contents in 
any different layout is problematic (at best).

Trying to use the result of a page layout as a data source is ... 
problematic.



-- 
Grant. . . .
unix || die





-- 
Grant. . . .
unix || die


More information about the cctalk mailing list