In article <456F030B.4080202 at yahoo.co.uk>,
Jules Richardson <julesrichardsonuk at yahoo.co.uk> writes:
Richard wrote:
To be honest, almost every time I have tried to
OCR something (even a
pristine original), it was simply faster and more accurate to type it
in myself. [...]
Oh, I agree. Twenty years down the line I expect it'll be a lot better though
Ahem. That's what they said 20 years ago! :-) Yeah, it is better now
than it was, but its still pretty frustrating to use, particularly
when compared against the hype of the products.
I think the worst is when you have an original that is several
generations of 1970s or 1980s copier away from the real document.
Then there's the aspect of printer output where the character
baselines aren't lined up on the printout, like old line printer
listings and stuff.
Hence my feeling that bi-level just isn't good
enough for some docs, because
it won't necessarily discriminate between real text and a hair / dirt / pen
mark where greyscale *might*. It's not infallible either of course - a blue
biro mark might be indistinguishable from the faded text below it after
scanning; give it five years and I'll probably be advocating full-colour scan
s
:-)
If I was scanning a low quality ancient original, then yeah, I'd
probably do grayscale to capture as much detail as possible for my
"archive" original and then put out a bi-level PDF for download.
--
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
<http://www.xmission.com/~legalize/book/download/index.html>
Legalize Adulthood! <http://blogs.xmission.com/legalize/>