At 10:41 AM 21/07/2019 -0600, you wrote:
On Sun, Jul 21, 2019, 4:16 AM Joseph S. Barrera III via cctalk <cctalk at classiccmp.org> wrote:
>I'd suggest that in 2019 when bits are cheap and high-quality scanners
>nearly as cheap, "crappy quality digital image" is a bit of a straw man.
>Yes, I've seen plenty of barely-readable or practically unreadable scans,
>but they were made years or decades ago.
There are still plenty of bad scans being done today, for various reasons.
The technology of producing a final digital copy continues to improve and has a way to go yet.
*This* is why I strongly oppose destroying rare docs to scan them, now. Better to wait
till non-destructive scanning methods become available.
>What dpi qualifies as not "crappy"? 300dpi? 400? 600?
Points:
1. Both the DPI and bits/pixel affect the visual result. Having shaded pixels on curved edges
makes the eye see a smooth curve, where the same resolution in two-tone (B&W) would look jagged.
Achieving an optimal balance of resolution and shading levels for various types of content and
fineness of detail, vs file size, is a bit of an art.
But ultimately it's a simple test: look at the paper original, and your final result on screen
(at 1:1 final scale.) Does the quality look the same?
Is your copy how the original publisher would have wanted the doc to appear?
People only auto-producing PDFs rarely catch on to this, because PDF ONLY encodes as one of:
two-tone B&W (fax mode), or JPG (or JPEG2000 rarely) or the excreable JBIG2 (Never use this!)
Experiment with PNG encoding, via a tool like Irfanview, which allows flexibly setting PNG
bits/pixel, raw, indexed color or gray scale. PNG is a lossless encoding, and so the only
resolution loss is by your choice while rescaling in post-processing.
2. The resolution you scan at, and the final presentation resolution, won't be the same.
Especially when the pages include elements like screened color or B&W images. To deal with
these properly you MUST scan at a resolution several times higher than the screen dot pitch.
Otherwise there will be moire patterns (beats) between the scan sampling and the screening dots.
Then you post-process to eliminate the screening, and end up with a truly tonal image at the
resolution the eye would perceive when viewing the original screened image.
This avoids any moire patterning, realizes the original publisher's visual intent, and enables
minimizing the final file data size.
B&W text should be encoded with at least 16 gray levels available to edge shading. ie 4 bits/pixel.
B&W tonal images need at least 256 level gray scale, or the eye sees quantization of shades (aka
posterization.)
Colour images need either 24 bit/px, ie 8 bits each for RGB, or if there are a limited number
of flat colours an indexed color scheme may work. 256 colors or less, ie an 8 bit index per pixel.
Typical utilities will generate the color table automatically (which can sometimes ba a pain.)
PDF does not allow any of these kind of user choices.
3. The final page images, don't have a 'dots per inch' dimension. They have only total number of
pixels in H & V. When doing final page image down-scaling and choice of encoding, you have to
make an aesthetic decision on final pixel dimensions.
If your original page was A4 (8.5" wide) and you scanned at 600 DPI, that's 5100 pixels wide.
But you'll likely find that the final copy can be scaled to around 1000 to 1200 pixels wide,
with 4 bits/px (if B&W text), for an on-screen page image indistinguishable from the original.
4. All post processing should be done in 24 bit RGB, at the full scan resolution. Keep staged backups.
NEVER use any indexed color scheme when scaling, rotating, etc. The result is unavoidably bad.
The final two steps should be: rescale to desided X-Y pixel size, THEN down-code to final
color system and file encoding. There's a discussion of this in http://everist.org/temp/On_scanning.htm
In general, 'acceptable' resolution VERY MUCH depends on the content.
>I just scanned my Rainbow 100 User's Manual at 300, 600 and 1200dpi using the scansnap default settings. You see a jump between 300 and 600, but little difference going on up to 1200 for this material. I posted the 300dpi results and even they are acceptable. Some of the diagrams look heavier than the 600dpi version and at high zoom you see pixelated letters, where the 600 doesn't. The 1200 is hard to see any big difference and takes 4x as long to scan. I think I'll be scanning the remaining rainbow docs at 600dpi. The file is 22MB vs 12MB, so that's worth it. The 1200dpi version was almost 70MB which is starting to get a bit large for a 60 sheet document. The sweet spot seems to be 600dpu, at least for this material.
Just wondering if you're aware of the freeware util Irfanview? https://www.irfanview.com/
It's very capable for batch processing large sets of images. Rescaling, changing coding, cropping, etc.
Guy
So, I have a bunch of old DEC Rainbow docs that aren't online. I also have
a snapscan scanner that I use for bills and such.
There's four kinds of docs, and I'm looking for advice:
(1) wire-ring bounded. What's the best way to scan these? The easiest is to
just clip the wire binding and drop it in the scanner. But then what?
(2) Folded with staples. These are booklet format, with stables in the
middle. I could easily remove the staple and scan. but how do I replace the
staple?
(3) Gum bound. These books are bound with some kind of gum / goo on the
spine. Some of these are so old I could just remove it and have no real
degradation of the state. Others have spines that are still in good shape.
(4) Three ring binder. This is easy: remove, scan, replace. Right?
Finally, how do I get the resulting scans into bigkeeper? Any fancy options
I should enable to make the pdfs maximally useful?
Warner
Hi,
This crossed my radar earlier today. I figured that someone on the
CCTalk mailing list might be interested in it.
Link - Vintage 1995 Novell WordPerfect 5.1+ for VMS TK50 Tape Digital
DEC VAX
- https://www.ebay.com/itm/133114102939
Buy It Now for $49.95 ($14.95 S&H) or Make an Offer.
--
Grant. . . .
unix || die
At 08:51 PM 18/07/2019 -0600, you wrote:
>On 7/18/19 3:50 PM, Warner Losh via cctalk wrote:
>> So, I have a bunch of old DEC Rainbow docs that aren't online. I also
>> have a snapscan scanner that I use for bills and such.
>>
>> There's four kinds of docs, and I'm looking for advice:
>
>I always wanted to apply (fiber) optics to this. I wanted something
>that was akin to a (glass) block that I could set on the bed of a
>scanner that would be tall enough that I could open books 90???110?? with
>the to be scanned side sitting on top of the raised / extended scanner
>bed with the book pages laying off to one side. Much like you would see
>if someone was reading the book while laying on their back.
>
>I don't know if anything like this exists or is even possible.
Same thing, much simpler. Called an Edge Scanner. (google) It's just a normal
travelling sensor scanner, but without all the wasted space along one side.
They usually can scan to within a small few mm of the edge of the glass plate,
and there's no side structure beyond the glass plate edge. You just raise
the scanner up on blocks to give sufficient vertical clearance at the side
for your book width. There's still the issue of compressing the book to
ensure the pages lay properly flat on the glass.
For this 'small edge' you pay a lot extra, even though many existing scanners
can be hacked to be edge scanners just by cutting away excess garbage at one side.
The usual corporate calculated feature-limitation bullsh*t.
I have a few related UNFINISHED articles online:
http://everist.org/temp/edge/20150214_hacking_edge.htmhttp://everist.org/temp/On_scanning.htmhttp://everist.org/temp/20140812_disconnecting_the_dots.htm
And threads like this make me hate myself for not having finished those.
Too busy, and they are all halted by dependencies on _other_ unfinished/
unsolved problems.
I have a lot more to say about the wisdom of destroying original publications
to scan them, especially when you are not already an expert at scanning and
the many tradeoffs.
But have to go afk just now.
Guy
OK. I've done the first of the manuals I have. Thanks for all the helpful
hints.
I took apart the Rainbow User's Manual's metal spiral spine. I scanned it
with scansnap and ran it through the indexing function. I think I tweaked
the settings in a reasonable way.
The results look good to my eye, but I'm not 100% sure, so I thought I'd
post it here for feedback:
https://people.freebsd.org/~imp/EK-P100E-OM-001_Rainbow_100_Owner's_Manual-Nov-1982.pdf
I have the manual still apart and can do additional scanning runs easily
enough. The paper is in great shape.
Second, how do I submit this to bitkeepers? I've looked around and don't
see how. maybe I'm just being blind...
Warner
At 04:50 PM 7/18/2019, Warner Losh via cctalk wrote:
>(1) wire-ring bounded. What's the best way to scan these? The easiest is to
>just clip the wire binding and drop it in the scanner. But then what?
Those are going to snag on each other, no? I'd trim the edges off.
>(2) Folded with staples. These are booklet format, with stables in the
>middle. I could easily remove the staple and scan. but how do I replace the
>staple?
Cut along the middle using a paper cutter.
>(3) Gum bound. These books are bound with some kind of gum / goo on the
>spine. Some of these are so old I could just remove it and have no real
>degradation of the state. Others have spines that are still in good shape.
Probably needs a pro paper cutter.
- John