Leaving them as a scanned image is the easy way out,
but isn't always
practical. Some pages I have have very small print, and the resolution
of the image required to make this text readable makes for huge files.
For pages that consist solely of text and line art, scan them as 300 DPI
TIFF Class F Group 4. That takes only 40-120K per page. I put the
resulting images into a PDF file, since most people don't have any other
G4-capable reader, and G4 is supported as a native PDF image format.
Some people always flame me about disliking PDF because they can't run
Acrobat Reader on their Commdore 64, but realisticly I've found that more
people have access to Acrobat Reader than any other viewer. My attitude
is that if I spend the time to scan the docs and make them available
free on my web site, people that don't like it can take a hike.
I've written a program using PDFlib to automate creating the PDF from a
directory full of G4 files.
For greyscale and color images, I'm working on a process to separate out
the images, use G4 coding on the monochrome portion of the page, and
overlay the images in JPEG format. This will also work nicely with
Acrobat reader, since it can support overlaid images, whereas most other
viewer software can't.
Some results of my scanning can be seen at
www.36bit.org. Note that
most of those scans were done *before* I got a sheet feeder. In my
experience, although there is some skew with the feeder, there is less
skew than when I do the pages manually, and the skew is more consistent
from page to page. If I get really motivated I'll
write some deskewing
software.
Eric