Hello Joerg,
Since you first published these, the listings have been of great use to me in improving
pdp2011. Before, there were many tests that would just crash, and it would take ages to
figure out why from the binary view in my logic analyzer and in some cases I just
couldn't figure out what was going on. The listings made things so much easier to
understand, and helped tremendously with implementing much more correct fpga pdp-11
systems. Before, there were only a few listings and some of them had crucial pages missing
and I just had to guess what happened there - sometimes I guessed right, but more often it
just was a mystery. Some of the xxdp programs are quite devious, and even with the listing
it can be quite the challenge to understand exactly what is going on.
I still regularly check the files I mirrored 5 or so years ago - and sometimes I still
need a couple days to figure out what is going on. Reading the pages is hardly ever an
issue, understanding what is on them sometimes still is.
Thanks for putting in all the work,
Cheers
Sytse
On 20 Nov 2021, at 08:30, Joerg Hoppe via cctalk
<cctalk at classiccmp.org> wrote:
Hi Friends,
Micro fiche scans of the PDP-11 XXDP listings are online now:
http://files.retrocmp.com/fichescanner/bitsavers/pdf/dec/pdp11/microfiche/D…
You can insert this into your bitsaver mirror tree with
$ cd <your-bitsavers-mirror-root>
$ wget --recursive --level 0 --no-host-directories --cut-dirs 2 --no-parent -R
index.htm?*
http://files.retrocmp.com/fichescanner/bitsavers/
You need about 130 GB space for 1600+ listings.
A Win10 version of wget is at
http://files.retrocmp.com/wget-1.21.2-win32.zip
In 2016 I posted a batch of listings, which was archived at
http://www.bitsavers.org/pdf/dec/pdp11/microfiche/ftp.j-hoppe.de/...
These were repacked and included in the above distribution.
So despite I'm very pleased to see my name on bitsavers:
Please discard the "ftp.j-hoppe.de" directory now !
For each listing there are 3 files:
- a "gray" pdf in archive quality.
- a highly compressed "bw" pdf, about 10x smaller.
- an ASCII *.dat with context and title strip data, prepared for database import.
The pdfs contain pictures of their fiches as title pages.
The quality of the fiches is everything between "brilliant" and
"awful"
DEC made every possible error while preparating them, the list is endless.
My favorite bug: Title strips glued to the wrong fiche (corrected here).
I even tried OCR but the results where poor.
"ocrmypdf" (= "tesseract + pdf") seems a good tool, but
the fiches are too problematic for a fully automatic run.
You have to dive into tesseracts training procedures.
See
https://hub.docker.com/r/jbarlow83/ocrmypdf/
Some project links:
http://www.retrocmp.com/projects/scanning-micro-fiches
https://youtu.be/X22gr5THBRA
https://hackaday.com/2021/09/17/automatic-microfiche-scanner-digitizes-docs/
By the way: This project ate up lots of (physical and personal) resources.
I'll will scan other document sets in the future, maybe begging for a donation then.
Enjoy!
Joerg