SC41MS.pdf was an Emulex manual that probably
from moremanuals --apologies. To be fair it is a PDF that
has been OCR'ed down to text, then recomposed as the
document it was. Probably beyond the scope of this
discussion. Here's another** example,
ENIAC press release (3 pages / 48 K)
although admittedly from people with a lot more time and
money on their hands.
The main reason I scan at 600 dpi and not save space
by dropping to 300 dpi is that the current crop of OCR
software seems (in my limited experience) to produce
better results with 600 dpi. (I cannot go higher than
600 anyway, but I probably would not bother with 1200dpi
because the OCR stuff I have won't even accept it).
I'd rather scan the manuals once and wait 10 years
for OCR to get to the stage that it can handle the docs
with maybe one error per 100/pages (rather than
the current standard of multiple errors per page ...).
I don't really want to be going back and doing it again!
The long term goal is definitely OCR. Although by the
time it's good enough we may well all have OC48 feeds
to the home and C3D recordable 125GB drives so there
may be no need :-)
I did notice that many of the moremanuals pages tend
show a scanning line when the're first displayed, as if true
white is noisy in some way --it either is light grey or contains
significant dot content.
Is that when a page is first displayed (i.e. for each
change of page) or just the first page? I assume that
you download and then view locally (I have issues
with both NS & IE when trying to use then to view
local PDFs via HTML ... I assume it would be
even worse with a download thrown in for
good measure!)
BTW thanks for doing all the work!
No problem. I'm was backing up my manuals
just in case and then saw others making theirs
available and followed suite. I just forgot to
stop when I ran out of my own manuals ...
**a pretty spectacular example showing the alignment
problems of the original typewriter, etc. Parent page:
An interesting PDF. They get strange changes of font
on pages 2 & 3 - I wonder if the original has that or
whether it's just an artifact of the OCR. A mixture of
text and background image (which is also text).
I wonder if they might be interested in converting the
multiple hundreds of manuals that live on the current
crop of DEC document web sites ...