Are we talking
jpegs/pdfs or ocr? How low a dpi can one go? Keeping in
I'd not really considered a file format, it wouldn't be JPEG though, as I
don't like the format.
JPEG is only good for continuous-tone images such as photographs. It does
horrible things to text and line art, since that's not what it was designed
for.
For typical manuals (lots of text and line art, few photographs), GIF is
(barely) tolerable, but doesn't really compress that well. ITU-T Group 4
fax format (rec T.6) compresses quite well (file sizes typically less than
half of equivalent GIF), and can be encapsulated in either TIFF Class F
or PDF files.
JBIG is even better for this stuff, getting file sizes 10 to 20 percent
smaller than G4 fax, but viewers are relatively uncommon.
I generally use PDF. Although a few whiners claim otherwise, it appears
to me that PDF viewers are now available on almost all contemporary
platforms. Most systems that don't run Acrobat Viewer can run Ghostscript
and Ghostview or xpdf.
I've put PDF files of some old DECsystem-10 manuals on
http://www.36bit.org/
I scanned the pages under Linux, then used the capture module of Adobe
Acrobat Exchange 3.01 under Windows 95(*) to convert the TIFF files into
a PDF. In the process it OCR'd the files, and the result is stored as
"invisible" text in the PDF file. That makes it possible to search the
file, even though it displays as a scanned image. The OCR wasn't good
enough to allow for archiving the documents as text only; it would take
far too much time to clean them up.
Cheers,
Eric
* I hope Adobe releases the full Acrobat Exchange for Linux; I'd gladly buy
it again. They have a Solaris version, so a Linux port should be trivial.