On Fri, 2005-05-20 11:36:24 +0000, Jules Richardson <julesrichardsonuk at
yahoo.co.uk> wrote:
On Fri, 2005-05-20 at 09:31 +0200, Jan-Benedict Glaw
wrote:
On Thu, 2005-05-19 22:20:53 +0000, Jules
Richardson <julesrichardsonuk at yahoo.co.uk> wrote:
On Thu, 2005-05-19 at 23:46 +0200, Jan-Benedict
Glaw wrote:
I'm still thinking about how paper-based
documentation can be made up
cleverly enough to gain text as well as images and mixing meta-data into
that. Maybe I'd do some C programming and hack something nice producing
PDF files helding everything? But first, I'd need to understand PDF
(whose specification actually is about 8cm thick...)
Doesn't this sort of imply that PDF is the wrong choice of format for
jobs like these? (plus I'm pissed at Adobe because their current readr
for Linux eats close on 100MB of disk space just to let me read a PDF
file :-)
There are alternatives, like:
- A tarball containing all the TIFF (or whatever) images as well
as some (generated)
See my other post; that's my preference and what I tend to do with all
image-based PDF content I download from anywhere anyway...
For the records (and my education), how do you extract these?
HTML page
(containing some kind of slide
show) as well as a small description file (use this with some
program (to be written) to generate the HTML file(s)).
This gives the chance that the description file can be done
quite clever, so you'll get eg. a clickable index for the TIFF
files (though, needs to be done manually, but now this work
load can actually be *distributed*)
One of the things that I was working on a few years back was layering
multiple delivery mechanisms over one form of content (where the dataset
was sufficiently large that storage in multiple formats wasn't
justified).
Data was kept in the "purest" form on the server side, and a client
could ask for content in whatever format they wanted (in this case raw
images, PDF, HTML etc.) and over whatever interface mechanism they
wanted (HTTP, FTP, WAP, email, network filesystem etc.)
Actually, I was working on something like that as well, but with a
different ulterior motive: build something like this as a redundant,
peer-to-peer capable database and many of archiving-old-data problems
just vanish. (Indeed, it would make up a nice P2P system as well.)
I could see some of the big archives around the planet
(regardless of
content) going this way in the future; user base is maximised through
offering different formats whilst the "pure" dataset is all that's
backed up and actually kept on disk.
That's what I dream about in long nights, but not as a centralized
database, but a distributed one. Imagine you shift in a raw-encoded
audio CD (with all the interleaved stuff intact, even containing all the
mischievous, intentional errors. ...and a front-end interlacing WAV
files (which another front-end could use to produce ogg/mp3/wma/you name
it).
Concepts like that can be applied to nearly all media types. Just record
that the underliing (sp?) recording/reading machinery gets and write
filters for that. These filters may get as complex as filesystem
drivers. In fact, this *is* a layered filesystem. Remember the thread(s)
about how to rescue tapes? ...or other HDD images? Apply these general
concepts and things may get easier :)
For now (as we're not (yet) there), prividing space isn't my main
problem. It's about having (time to write) the software.
MfG, JBG
--
Jan-Benedict Glaw jbglaw at lug-owl.de . +49-172-7608481 _ O _
"Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg _ _ O
fuer einen Freien Staat voll Freier B?rger" | im Internet! | im Irak! O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));