I've observed that many text documents are formatted in HTML rather than
PDF, and have links to seaparate files for the graphic segments. The
manuals do have considerable text in them, which might benefit from OCR and
re-creation and re-editing of the manuals with only the scanned graphics as
original files. Even some of the schematic segments might be better
re-created due to the fine line pitch, which tends to become really ugly in
scanned documents.
I use Typemaster Pro, which is quite old, but very effective at isolating
graphics from text and perhaps well suited for segregating the text sections
from the illustrations. It's been around since
about 10 years ago when
nothing else would touch it. My scanner is a 300 DPI
monochrome (but legal
size) scanner with a sheetfeeder (which I wish would work properly). In
conjuction with this old scanner, the software has done multipage scans to
text of large documents in almost as little time as it takes to read them.
It manages to learn the fonts and handles two typefaces with serifs at the
same time as two without. If your document has more than that, you're on
your own, of course, but it does a nice job, particularly with handling text
which is flowed around some graphics, which it recognizes and leaves
undisturbed.
If, ultimately, the decision is made to serve these documents up in linked
form rather than monolithic form, I'd submit that it is still desirable to
be able to download the entire document as a single object. Some provision
for that must be made, and I don't think it's simple.
Comments?
Dick
-----Original Message-----
From: Tony Duell <ard(a)p850ug1.demon.co.uk>
To: Discussion re-collecting of classic computers
<classiccmp(a)u.washington.edu>
Date: Sunday, June 06, 1999 10:36 AM
Subject: Re: Disk Drive Documents
[Wonderful list of docs snipped]
If this stuff is worth preserving, perhaps
there's a way to save scanned
images for eventual conversion to PDF. Does anyone know about this?
In my opinion, PDF files are not really appropriate for scanned documents
(they _may_ be more use for documents that are initially created and
distributed in this format). For one (selfish) thing, I've yet to find a
useable way to print these out on any of my machines.
The best way I've seen so far for this is simply to put a directory of
suitable graphics files on an ftp site (.gif seems to compress quite well
- 17"*11" circuit diagrams scanned at 300dpi are around 300K) and provide
a text file describing each page (not just as 'page 7 of the ST506 service
manual' but something like 'page 7 -- Page 1 of 3 of the schematic').
That way, people can download just what they want (if you need a
schematic, you don't waste time downloading parts lists as well). And the
result is portable to a lot more systems.
If there are substantial text areas in the manual it may be worth trying
to OCR it to a plain ASCII file.
-tony