On Thu, 2005-05-19 22:20:53 +0000, Jules Richardson <julesrichardsonuk at
yahoo.co.uk> wrote:
On Thu, 2005-05-19 at 23:46 +0200, Jan-Benedict Glaw
I'm still thinking about how paper-based
documentation can be made up
cleverly enough to gain text as well as images and mixing meta-data into
that. Maybe I'd do some C programming and hack something nice producing
PDF files helding everything? But first, I'd need to understand PDF
(whose specification actually is about 8cm thick...)
Doesn't this sort of imply that PDF is the wrong choice of format for
jobs like these? (plus I'm pissed at Adobe because their current readr
for Linux eats close on 100MB of disk space just to let me read a PDF
file :-)
There are alternatives, like:
- A tarball containing all the TIFF (or whatever) images as well
as some (generated) HTML page (containing some kind of slide
show) as well as a small description file (use this with some
program (to be written) to generate the HTML file(s)).
This gives the chance that the description file can be done
quite clever, so you'll get eg. a clickable index for the TIFF
files (though, needs to be done manually, but now this work
load can actually be *distributed*)
- PDF isn't all that wrong. As far as I understood it, it's
possible to embed any binary sequence into a PDF file. With a
program (like an extended tumble) you can produce a readable
PDF file that also acts as some kind of tarball (though needs
a self-written generator/extractor).
I actually *really* like the PDF approach, just because it's so easy and
hassle-free to view the file. Also, if done right, you won't loose
access to your actual image files. But in the long term, we need to work
on the tools. (That's why I started to play a bit with the TeX
That's my wishlist (to be addressed after the vax-linux port matured:-)
- PDF files with really cool bookmarks, containing the chapter
numbers, headings or page numbers (or any mixture of those).
That means that the backing store might either be a tarball or
a PDF file (used as a tarbarll).
- I'm *really* missing a clickable index. People may invest the
time to hand-type the original book's index and the pages into
some description file. From this, I expect a new index
(clickable) to be generated.
- If graphical contents is OCRed+verified, there shall be a way
to generate the final PDF with the OCRed data (except for
those pages where this hasn't been done--there, the original
image file should show up).
Jan-Benedict Glaw jbglaw at lug-owl.de . +49-172-7608481 _ O _
"Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg _ _ O
fuer einen Freien Staat voll Freier B?rger" | im Internet! | im Irak! O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));