On Fri, 2005-05-20 at 09:31 +0200, Jan-Benedict Glaw wrote:
On Thu, 2005-05-19 22:20:53 +0000, Jules Richardson
<julesrichardsonuk at yahoo.co.uk> wrote:
On Thu, 2005-05-19 at 23:46 +0200, Jan-Benedict
Glaw wrote:
I'm still thinking about how paper-based
documentation can be made up
cleverly enough to gain text as well as images and mixing meta-data into
that. Maybe I'd do some C programming and hack something nice producing
PDF files helding everything? But first, I'd need to understand PDF
(whose specification actually is about 8cm thick...)
Doesn't this sort of imply that PDF is the wrong choice of format for
jobs like these? (plus I'm pissed at Adobe because their current readr
for Linux eats close on 100MB of disk space just to let me read a PDF
file :-)
There are alternatives, like:
- A tarball containing all the TIFF (or whatever) images as well
as some (generated)
See my other post; that's my preference and what I tend to do with all
image-based PDF content I download from anywhere anyway...
HTML page (containing some kind of slide
show) as well as a small description file (use this with some
program (to be written) to generate the HTML file(s)).
This gives the chance that the description file can be done
quite clever, so you'll get eg. a clickable index for the TIFF
files (though, needs to be done manually, but now this work
load can actually be *distributed*)
One of the things that I was working on a few years back was layering
multiple delivery mechanisms over one form of content (where the dataset
was sufficiently large that storage in multiple formats wasn't
justified).
Data was kept in the "purest" form on the server side, and a client
could ask for content in whatever format they wanted (in this case raw
images, PDF, HTML etc.) and over whatever interface mechanism they
wanted (HTTP, FTP, WAP, email, network filesystem etc.)
Conversion of imagery tends to be memory-expensive, but not
computationally so (unless you're also scaling / cropping images);
caching can be added in where necessary so save on *some* resources.
The other advantage is that as a database is keeping track of what's in
the system there are potential hooks there for indexing and maintaing
access stats.
I could see some of the big archives around the planet (regardless of
content) going this way in the future; user base is maximised through
offering different formats whilst the "pure" dataset is all that's
backed up and actually kept on disk.
cheers
Jules