Hi,
I spent a few hours last night playing with different
scanning options and figured I would share my observations.
Definitely not a "scientific experiment" but, rather, "just
tinkering".
Image in question was just a page of text -- probably 8-10pt.
Laid out in two columns, quite a bit of whitespace. The
original image dimensions are about 8.5x12" (yes, 12, not 11).
[Sorry, in retrospect I should have used an 8.5x11 image as
this would be easier for most folks to relate to :< Instead,
I just set the scanner to scan half the available area (it's
a B-size scanner)]
First, I did a monochrome scan at 400dpi (which is where I
tend to do most of my scans). The resulting TIFF file was 2MB.
When viewed "on screen", the TIFF file (i.e., eliminating any
effects of the scanner software) was quite readable. No signs
of jaggies, etc.
I ran that image through various compressors (still sticking with
TIFF). "Packed bits" yielded a file size of 360K -- as did
Huffman encoding. LZW dropped this to 220K. CCITT3-1D encoding
dropped it to 217K while CCITT3-2D brought it down to 131K.
And, CCITT4 brought it down to 100K.
I then ran the same image at 800dpi (what the heck, let's live
dangerously!!). As expected, the original TIFF grew to 8M.
The CCITT4 variant grew to 250K. (I didn't bother with all
the other encodings as these two represent the aparent
extremes of monochrome representations *EASILY* AVAILABLE TO ME).
Next, I scanned the same image at 400dpi in *color*.
A 24 bit TIFF took a whopping 55MB. (can you say, "Sorry,
but we don't got no bananas...").
I then tried to save the image as a JPEG -- *guessing* at
appropriate compression/smoothing factors to get the resulting
image size down to ~1MB. (for reference, ASSUMING THESE
"settings" ARE PORTABLE, 4:2:2, 77 compression, 10 smoothing,
optimized but NOT progressive). I got lucky (?) and this
first pass gave me a 530K image.
With the 250KB *800*dpi B&W C4 TIFF in mind, I decided to
push the file size even smaller. (compression increased to 90)
This resulted in a 360K image. Even more squeezing (compr = 95)
got this down to 250K.
However, the *quality* of the image was very disappointing!
The 530KB version was quite "fuzzy" (not "jaggies", since JPEGs
are more continuous tone than a B&W TIFF, but, rather, "blurry").
The 360KB version began introducing noticeable artifacts that
were clearly not present in the original image. This got
worse in the 250K version.
Bottom line: the 100KB B&W TIFF was much better looking
than even the 530KB JPEG. And the 250KB B&W TIFF was so
"fine" that I suspect it is overkill (I doubt anyone or
anyTHING -- i.e. software -- could discriminate between
that at 800dpi and the 400dpi version).
I had earlier tried some gr[ea]y scale scans and convinced
myself I must be doing something wrong (as the sizes were
just so much larger) so I didn't pursue them. In hindsight,
a more scientific approach would try to JPEG encode those
monochrome representations as well. (I'm through experimenting
as I have the answer *I* want/need) Somehow, I doubt they
will prove to be more economical than the C4 TIFFs.
[N.B. the gr[ea]yscale scans are much "softer" on the eyes
(no doubt due to the continuum of "value")]
So, to answer *my* initial question (ages ago), 400dpi C4 TIFFs
are definitely "adequate". 800dpi are overkill. And, at
~100KB per page, they are quite "affordable".
We now return you to your regularly scheduled program...
Show replies by date
Don wrote:
Hi,
I spent a few hours last night playing with different
scanning options and figured I would share my observations.
Bottom line: the 100KB B&W TIFF was much better
looking
than even the 530KB JPEG. And the 250KB B&W TIFF was so
"fine" that I suspect it is overkill (I doubt anyone or
anyTHING -- i.e. software -- could discriminate between
that at 800dpi and the 400dpi version).
So, to answer *my* initial question (ages ago), 400dpi C4 TIFFs
are definitely "adequate". 800dpi are overkill. And, at
~100KB per page, they are quite "affordable".
I'm glad that you found out, what most people who really scan b&w
text&schematics knew all the time ;-)
And please, not another discussion about jpeg & b&w scans again.
The last BS about it is barely weeks ago ...
e.stiebler wrote:
Don wrote:
Hi,
I spent a few hours last night playing with different
scanning options and figured I would share my observations.
Bottom line: the 100KB B&W TIFF was much
better looking
than even the 530KB JPEG. And the 250KB B&W TIFF was so
"fine" that I suspect it is overkill (I doubt anyone or
anyTHING -- i.e. software -- could discriminate between
that at 800dpi and the 400dpi version).
So, to answer *my* initial question (ages ago), 400dpi C4 TIFFs
are definitely "adequate". 800dpi are overkill. And, at
~100KB per page, they are quite "affordable".
I'm glad that you found out, what most people who really scan b&w
text&schematics knew all the time ;-)
For *schematics*, I would consider 400dpi far too coarse.
I'm already experimenting with 800 and 1200dpi to see where
the "sweet spot" is for them. But 400 just didn't cut it :-(
(unless youre scanning DEC printsets which are usually pretty
coarse to begin with! :> )
And please, not another discussion about jpeg &
b&w scans again.
The last BS about it is barely weeks ago ...
On 9/3/2006 at 9:02 AM Don wrote:
>> I spent a few hours last night playing with
different
>> scanning options and figured I would share my observations.
Your results pretty much agreed with what I've observed scanning music.
One question not answered to my satisfaction yet is "how many grey levels
is enough?" I've got a suspicion that as litttle as 4 may be perfectly
adequate most most line drawings.
If it's any comfort, the Library of Congress seems to agree with your
findings, but they tend to be split over reserving grey scale scanning for
things like pictorial and cover art and leave musical scores as B&W.
However, that's only in one collection--other collections scan everything
in greyscale. But TIFF is the file format.
Cheers,
Chuck
Chuck Guzis wrote:
On 9/3/2006 at 9:02 AM Don wrote:
>> I spent a few hours last night playing
with different
>> scanning options and figured I would share my observations.
Your results pretty much agreed with what I've observed scanning music.
One question not answered to my satisfaction yet is "how many grey levels
is enough?" I've got a suspicion that as litttle as 4 may be perfectly
adequate most most line drawings.
Many TIFF decoders complain if you use "unusual" BPP values.
I'd have to reread the spec to see if it is actually *allowed*
but I had to write my own encoder/decoder to process 2BPP
images. Shirley, this would discourage any such use if the
resulting images were not "widely portable".
If it's any comfort, the Library of Congress seems
to agree with your
findings, but they tend to be split over reserving grey scale scanning for
things like pictorial and cover art and leave musical scores as B&W.
However, that's only in one collection--other collections scan everything
in greyscale. But TIFF is the file format.
e.stiebler wrote:
And please, not another discussion about jpeg &
b&w scans again.
The last BS about it is barely weeks ago ...
<grrr>
I thought it was decided by Jay that things related to preservation were
on-topic (even though this is the off-topic list!).
</grrr>
Jules Richardson wrote:
e.stiebler wrote:
And please, not another discussion about jpeg
& b&w scans again.
The last BS about it is barely weeks ago ...
<grrr>
I thought it was decided by Jay that things related to preservation were
on-topic (even though this is the off-topic list!).
</grrr>
It is on topic for whatever measure of "on topic" I know and like. But
we talked about it so many times, and why are we having an archive of
classiccomp, if nobody reads it ?
e.stiebler wrote:
and why are we having an archive of classiccomp, if
nobody reads it ?
I thought that for technical reasons (i.e. it takes many hours to rebuild the
index, and because Jay hasn't figured out how to clone himself ;) the archive
was at the stage of: "only works between the hours of 6-8pm, if you happen to
be on US time". Maybe it's fixed now and I missed a post about it...
J.
Jules Richardson <julesrichardsonuk at yahoo.co.uk> wrote:
e.stiebler wrote:
and why are we having an archive of classiccomp,
if nobody reads it ?
I thought that for technical reasons (i.e. it takes many hours to rebuild the
index, and because Jay hasn't figured out how to clone himself ;) the archive
was at the stage of: "only works between the hours of 6-8pm, if you happen to
be on US time". Maybe it's fixed now and I missed a post about it...
Hey, I use the archive occasionally. Usual method of using:
1. Download the gzipped archive of each month's messages. (Typically 1
Mbyte for a month of cctalk).
2. Uncompress the archives.
3. Grep for what I'm looking for.
Often I use it as a substitute for reading the E-mails.
Realistically, on a discussion list most traffic will be people who
want to talk (even if it's just rehashing an old topic for the umpteenth
time) instead of actually exchanging information.
Tim.
Don wrote:
I had earlier tried some gr[ea]y scale scans and
convinced
myself I must be doing something wrong (as the sizes were
just so much larger) so I didn't pursue them.
Hmm, a greyscale TIFF test would be the most important one I'd say - something
like 16 levels; 256 *might* be overkill.
What's readable to a human is important - but equally so is encoding enough
information such that scans of sub-standard source material (dirty, torn etc.)
could be post-processed on a per-case basis if needs be, before passing to a
subsequent OCR step. This is the bit where bi-level scanning tends to fall
down as it depends where the level between white/black is as to what gets
encoded for "damaged" sections of a document. On pristine source material and
at a high enough resolution (so that viewing on a screen scaled-down gets rid
of the jagged edges) I'm sure it's otherwise fine.
[N.B. the gr[ea]yscale scans are much
"softer" on the eyes
(no doubt due to the continuum of "value")]
Lots of the stuff I come across tends to have at least one or two
continuous-tone photos inside, which if I was scanning at bi-level (which I
never do) I'd have to treat as specific cases.
Jules Richardson wrote:
Don wrote:
I had earlier tried some gr[ea]y scale scans and
convinced
myself I must be doing something wrong (as the sizes were
just so much larger) so I didn't pursue them.
Hmm, a greyscale TIFF test would be the most important one I'd say -
something like 16 levels; 256 *might* be overkill.
What's readable to a human is important - but equally so is encoding
enough information such that scans of sub-standard source material
(dirty, torn etc.) could be post-processed on a per-case basis if needs
be, before passing to a subsequent OCR step. This is the bit where
bi-level scanning tends to fall down as it depends where the level
between white/black is as to what gets encoded for "damaged" sections of
a document. On pristine source material and at a high enough resolution
(so that viewing on a screen scaled-down gets rid of the jagged edges)
I'm sure it's otherwise fine.
I tend to take good care of my drawings -- since they usually
are the ONLY "deliverable" that I produce ;-) So, even 30
year old designs are still on nice crisp (unwrinkled) paper!
(Hence the desire to scan all this stuff... keeping drawings
that "pristine" after all that time takes a LOT of effort)
[N.B. the
gr[ea]yscale scans are much "softer" on the eyes
(no doubt due to the continuum of "value")]
Lots of the stuff I come across tends to have at least one or two
continuous-tone photos inside, which if I was scanning at bi-level
(which I never do) I'd have to treat as specific cases.