I tend to do that too. I'd much rather use my
favourite image
viewer to
flick through images than deal with a pdf file, plus I can just put
everything in a tar or zip archive if I do need to distribute as a
single file.
I have to admit that I prefer pdf, just because
all the platforms I use have a decent pdf reader.
(Well, xpdf on OpenVMS VAX is slow, but then I
guess my expectations are at fault there :-))
I never thought I'd say it, but maybe wrapping the
data up in *simple*
HTML markup is the best way - at least then it is readable in a
plain-text editor, and finding a machine with a web browser
is probably
easier than finding a machine with Word installed.
If you've OCRed the data, HTML is probably fine for
pure text. Once you start to have text + images then
you have a bunch of files to keep tied together. You
could zip them up but PDF works well for me here too.
I believe (but have not tried) that you can go
from PDF to text in this case without any great
difficulty (I don't recall what happens to images).
Actually, I suppose seperate images can help here too
as people can
navigate straight away to what they want, plus they don't need to
download the whole of a huge pdf file before they can start reading.
I prefer to grab the whole thing anyway. Today I might
just want the frobozz pinout, but tomorrow I'm almost
certain to need the lead engineer's middle initial,
by which time I'll have forgotten where I found the
docs in the first place.
I know a lot
of you expressed concerns about JPEGs, but I
haven't been
able to get anywhere near the compression using
other methods, for
greyscale images. Am I overlooking any options?
Probably not. JPEG is lossy after all, so I expect it'll always do
better than a non-lossy format. It's a tradeoff between size
and quality
As has been pointed out (quite a few times on this list, I think),
JPEG is very poor for text and line drawings. JPEG is better
than nothing, but the archived version should be in something
more appropriate (and lossless).
problem is that
you need to be *really* sure that your OCR versions are good
before you
can risk taking the raw scans offline, which means having a lot of
Once I've generated a raw scan (or picked up someone elses)
I expect to keep it around essentially forever. OCR has improved
immensely in the last few years, but not to the point where
I can throw a scan of a poor quality photocopy at it and expect
something that looks like the original with zero errors.
(The Module/Options list that Eric Smith scanned would be
an excellent torture test for any candidate "perfect" OCR program).
Another point is that if you have high quality scans, why
keep them to yourself? By all means have low-res versions
available for those who just need a page or two or just
need to look something up quickly and don't care about
the artefacts, but make the "masters" available too. If you
don't have the space yourself, there are people on this list
who seem to have no problem with online disk space.
Antonio
--
---------------
Antonio Carlini arcarlini(a)iee.org