Back in March 2001 I posted about a cache of 20,046 pages
of scanned docs I received from someone on the net.
See the TOC below, followed by his explanation of how
he did it.
It consumed several CD-Rs, compressed. I now have a DVD burner
as well, so I'd be glad to make copies on new or old media.
(It is actually all available on a hidden web page that
I disclose if someone sends me a pointed email, but I'd
hate to stress my little T-1.)
Anyone care to upgrade it to OCR'd PDF or whatever would be
considered a next-best method of preservation and search-ability?
I know it's possible with a handful of Linux, but my to-do list
is already too long.
- John
I've made contact with a guy who's scanned 20,046 pages of the
docs listed below, at 300 to 400 DPI. He first told me about the
UCSD p-System docs he'd scanned. Below the list is his description
of the process he followed.
I'm planning to get a copy of what he has and burn it to CD-R.
Does anyone else have an interest in these docs, or have any
ideas about distribution without massive copyright violation?
- John
6502
MOS 6502 datasheet
6502 Assembly Language Subroutines (Leventhal)
AMD
AMD 29000 Memory Design Handbook
Am29027 Arithmetic Accelerator
Am29C327 Floating Point Processor
Data General
C Language Reference Manual
GATE User's Manual
AOS/VS Internals Manual
AOS/VS Programmer's Manual, volume 1
AOS/VS System Calls Dictionary
CEO User's Manual
Eclipse 32-bit Principles of Operation
Eclipse 32-bit System Functional Characteristics
Fortran-77 Environment Manual
Fortran-77 Reference Manual
Fairchild
Clipper User's Manual
IDT
RISC System Programmer's Guide
R3000 Assembly Language Programmer's Guide
R3000 Hardware User Manuals
R3000 Language Programmer's Guide
High-speed CMOS databook
Motorola
68000 Family Reference
68020 User's Manual
68851 User's Manual
88100 User's Manual
88200 User's Manual
Linear Interface Integrated Circuits
NCR
53C90A/B Advanced SCSI Controller (2 different manuals)
53C94/5/6 databook
53CF94/96-2 Fast SCSI Controller
Disk Array Controller Firmware
Disk Array Controller Hardware
Disk Array Controller Software
Floppy Disk Controller (SCSI-to-FD)
National Semiconductor
NS32532 Datasheet
Series 32000 Programmer's Reference Manual
DP8490 Enhanced Asynchronous SCSI Interface
NS32CG16 Programmer's Reference Supplement
Graphics Handbook
Series 32000 Databook
DRAM Management databook
Embedded Controller Databook
Ohio Scientific
C4P User's Manual (2 different manuals)
65V Programmer's manual
Schematics for:
502 CPU board
505 CPU board
527 24K memory board
540 Video board
542 Polled Keyboard
Pinnacle Systems
2 User's manuals for their 68k machine (My P-system machine)
P-system manuals IV.12
Operating System Reference
Program Development Reference
Application Development Guide
Fortran 77 Reference
Assembler Reference
Weitek
WTL4167 Floating-Point Coprocessor datasheet
Most of these are from about 1988 to 1992, with the exception of the OSI
documentation, of course, which is from 1979.
---
What sort of process did you follow? What sort of
devices?
As far as the process, I scanned a manual in and checked to make sure
all the pages were there. If they weren't, I'd scan the pages that
didn't make it, and go through all the pages again. I'll admit this is a
little anal, but better safe than sorry. (When you're using a lot of
shell scripts, you never know if you accidently deleted a page with an
"mv" command.) When all the pages where there, I'd go through the manual
one more time to check for general quality (no folded corners, no torn
pages, etc.) If all was good, the manual would be moved to the directory
that would be the root directory of my CD-ROM. That's pretty much it.
The big manuals of more than 1000 pages really sucked, because I'd
generally have to make 3 or more passes to get those completely correct.
If I was going to do it again, I'd probably break the larger manuals
into smaller chunks to avoid this problem.
One thing that made the whole process a lot easier was the netpbm
utilities. I wrote a script to convert the manuals from ~2500x3300 TIFs
to ~500x600 GIFs. My machine takes about 2 seconds to process a 300-400
DPI TIF, but only a fraction of a second for a 75 DPI GIF. I'd run my
script, then do something else for a while. When it was done, I could
flip through the GIFs with GQview and inspect about 2-4 pages per
second. That saved a lot of time.
I assume that, by "devices", you mean what type of scanners I used. I
started with an HP 6350cse (with ADF) that I bought for this very
purpose. However, having never owned a scanner before, I was a little
disappointed with how slow the "fast" scanners are. Fortunately, imaging
is an integral part of the software my company sells and, as luck would
have it, we were demoing a new scanner from Fujitsu. This thing
literally does 60 pages/min at 300 dpi - *both* sides. It's about half
that fast at 400 dpi, which I had to use for the IC databooks to get the
fine print. Needless to say, I did most of my scanning on that.
By the way, to date, I've processed 20046 pages. I'm kinda burned out,
though, so it'll be a while before I do any more.