At 10:41 AM 21/07/2019 -0600, you wrote:
On Sun, Jul 21, 2019, 4:16 AM Joseph S. Barrera III via cctalk <cctalk at
classiccmp.org> wrote:
I'd suggest that in 2019 when bits are cheap
and high-quality scanners
nearly as cheap, "crappy quality digital image" is a bit of a straw man.
Yes, I've seen plenty of barely-readable or practically unreadable scans,
but they were made years or decades ago.
There are still plenty of bad scans being done today, for various reasons.
The technology of producing a final digital copy continues to improve and has a way to go
yet.
*This* is why I strongly oppose destroying rare docs to scan them, now. Better to wait
till non-destructive scanning methods become available.
What dpi qualifies as not "crappy"?
300dpi? 400? 600?
Points:
1. Both the DPI and bits/pixel affect the visual result. Having shaded pixels on curved
edges
makes the eye see a smooth curve, where the same resolution in two-tone (B&W) would
look jagged.
Achieving an optimal balance of resolution and shading levels for various types of
content and
fineness of detail, vs file size, is a bit of an art.
But ultimately it's a simple test: look at the paper original, and your final
result on screen
(at 1:1 final scale.) Does the quality look the same?
Is your copy how the original publisher would have wanted the doc to appear?
People only auto-producing PDFs rarely catch on to this, because PDF ONLY encodes as
one of:
two-tone B&W (fax mode), or JPG (or JPEG2000 rarely) or the excreable JBIG2 (Never
use this!)
Experiment with PNG encoding, via a tool like Irfanview, which allows flexibly setting
PNG
bits/pixel, raw, indexed color or gray scale. PNG is a lossless encoding, and so the
only
resolution loss is by your choice while rescaling in post-processing.
2. The resolution you scan at, and the final presentation resolution, won't be the
same.
Especially when the pages include elements like screened color or B&W images. To
deal with
these properly you MUST scan at a resolution several times higher than the screen dot
pitch.
Otherwise there will be moire patterns (beats) between the scan sampling and the
screening dots.
Then you post-process to eliminate the screening, and end up with a truly tonal image
at the
resolution the eye would perceive when viewing the original screened image.
This avoids any moire patterning, realizes the original publisher's visual intent,
and enables
minimizing the final file data size.
B&W text should be encoded with at least 16 gray levels available to edge shading.
ie 4 bits/pixel.
B&W tonal images need at least 256 level gray scale, or the eye sees quantization
of shades (aka
posterization.)
Colour images need either 24 bit/px, ie 8 bits each for RGB, or if there are a limited
number
of flat colours an indexed color scheme may work. 256 colors or less, ie an 8 bit index
per pixel.
Typical utilities will generate the color table automatically (which can sometimes ba a
pain.)
PDF does not allow any of these kind of user choices.
3. The final page images, don't have a 'dots per inch' dimension. They have
only total number of
pixels in H & V. When doing final page image down-scaling and choice of encoding,
you have to
make an aesthetic decision on final pixel dimensions.
If your original page was A4 (8.5" wide) and you scanned at 600 DPI, that's
5100 pixels wide.
But you'll likely find that the final copy can be scaled to around 1000 to 1200
pixels wide,
with 4 bits/px (if B&W text), for an on-screen page image indistinguishable from
the original.
4. All post processing should be done in 24 bit RGB, at the full scan resolution. Keep
staged backups.
NEVER use any indexed color scheme when scaling, rotating, etc. The result is
unavoidably bad.
The final two steps should be: rescale to desided X-Y pixel size, THEN down-code to
final
color system and file encoding. There's a discussion of this in
http://everist.org/temp/On_scanning.htm
In general, 'acceptable' resolution VERY MUCH depends on the content.
I just scanned my Rainbow 100 User's Manual
at 300, 600 and 1200dpi using the scansnap default settings. You see a jump between 300
and 600, but little difference going on up to 1200 for this material. I posted the 300dpi
results and even they are acceptable. Some of the diagrams look heavier than the 600dpi
version and at high zoom you see pixelated letters, where the 600 doesn't. The 1200 is
hard to see any big difference and takes 4x as long to scan. I think I'll be scanning
the remaining rainbow docs at 600dpi. The file is 22MB vs 12MB, so that's worth it.
The 1200dpi version was almost 70MB which is starting to get a bit large for a 60 sheet
document. The sweet spot seems to be 600dpu, at least for this material.
Just wondering if you're aware of the freeware util Irfanview?
https://www.irfanview.com/
It's very capable for batch processing large sets of images. Rescaling, changing
coding, cropping, etc.
Guy
The above is all nice and such. My cut.
If the Book is a "one of" as in rare and likely few if any copies
exist Then preservation comes first with copies by non destructive means
as possible.
Other books, manuals, Docs, with print volumes in the likely greater
ranges of 10s of thousands or higher... I go the other way getting on
line information has priority over preservation as there are plenty of
copies to preserve. Crappy scans are never good, but in some cases for
radio repair work they were the Rosetta stone. Better scans are not
hard any more. But I go back to a 525line black and white camera on
a stand taking single frames. My old Cannon CanaoScan and "Xsane" does
a slow but very good job.
In the end information is valuable and only if it is available as then
we can share it. There are cases where we have to suffer a less than
best electronic copy as preservation comes first but a copy in some
usable if not the best form is still better than "I hear there is a book
about it".
Based on that I've taken books and manuals that I know more exist and
shredded them to make copies or scan.
ON preservation:
Me I'd love to know what the engineering library (ML4-1) of DEC a
massive quantity of aperture cards went to. Generally If I needed
or wanted it it was available (during my time at DEC) if I had a
valid part or model number. To me that was a preserve at all costs
including the systems used to retrieve and print.
Allison