If you OCR, always archive the bitmaps too - Re: Regarding Manuals

Antonio Carlini a.carlini at ntlworld.com
Sun Sep 27 10:49:25 CDT 2015


On 27/09/15 15:08, Johnny Billquist wrote:
>
> Errors are always bad. Agreed. That is not something we're discussing 
> here.
>
> I don't have problems reading the current scans, as such. But when 
> having ten of these open at the same time, and scrolling through them, 
> it becomes obvious that the bitmaps are heavy. It can take a while for 
> the screen to be updated. Not to mention the problems you sometimes 
> hits with searching...
>

I think we are discussing errors. I did try to OCR stuff when I first 
started scanning I didn't find anything
that could do an even marginally acceptable job. That's perhaps less of 
an issue for War and Peace but
pretty serious for a technical manual.

I understand that having multiple 100MiB+ documents open at once will be 
sluggish, certainly compared
to those same documents once they've been through OCR. However, if I 
scan something and make the
raw scan available, someone can OCR it later (and re-upload just the OCR 
version if they want). If I OCR
it and don't make the raw scan available then people are potentially 
stuck with whatever OCR could manage
in 2015 (or earlier) and future-you will rightly curse me ten years down 
the line (I'm assuming that OCR is
getting better with time, of course ....).

Antonio
arcarlini at iee.org




More information about the cctalk mailing list