Hi Friends,
Micro fiche scans of the PDP-11 XXDP listings are online now:
http://files.retrocmp.com/fichescanner/bitsavers/pdf/dec/pdp11/microfiche/D…
You can insert this into your bitsaver mirror tree with
$ cd <your-bitsavers-mirror-root>
$ wget --recursive --level 0 --no-host-directories --cut-dirs 2
--no-parent -R index.htm?*
http://files.retrocmp.com/fichescanner/bitsavers/
You need about 130 GB space for 1600+ listings.
A Win10 version of wget is at
http://files.retrocmp.com/wget-1.21.2-win32.zip
In 2016 I posted a batch of listings, which was archived at
http://www.bitsavers.org/pdf/dec/pdp11/microfiche/ftp.j-hoppe.de/...
These were repacked and included in the above distribution.
So despite I'm very pleased to see my name on bitsavers:
Please discard the "ftp.j-hoppe.de" directory now !
For each listing there are 3 files:
- a "gray" pdf in archive quality.
- a highly compressed "bw" pdf, about 10x smaller.
- an ASCII *.dat with context and title strip data, prepared for
database import.
The pdfs contain pictures of their fiches as title pages.
The quality of the fiches is everything between "brilliant" and
"awful"
DEC made every possible error while preparating them, the list is endless.
My favorite bug: Title strips glued to the wrong fiche (corrected here).
I even tried OCR but the results where poor.
"ocrmypdf" (= "tesseract + pdf") seems a good tool, but
the fiches are too problematic for a fully automatic run.
You have to dive into tesseracts training procedures.
See
https://hub.docker.com/r/jbarlow83/ocrmypdf/
Some project links:
http://www.retrocmp.com/projects/scanning-micro-fiches
https://youtu.be/X22gr5THBRA
https://hackaday.com/2021/09/17/automatic-microfiche-scanner-digitizes-docs/
By the way: This project ate up lots of (physical and personal) resources.
I'll will scan other document sets in the future, maybe begging for a
donation then.
Enjoy!
Joerg