On Wed, 2009-09-30 20:59:06 +0200, Oliver Lehmann <lehmann at ans-netz.de> wrote:
Jan-Benedict Glaw wrote:
Until now, scanning documents was mainly about
conserving the paper
and being able to share it without snail-mailing around the stuff,
which always also contains the danger of loosing it. But we can now
really polish the stuff. That's /not/ a substitute for Bitsavers et
al.--we need them. The PDFs over there are a perfect format for
archiving the scanned pages. But we'd place generated PDFs next to
them, containing real Table of Contents, biblopgraphy entries and
possibly even Indices, among the OCRed text.
It all stands and falls with the quality of the printed document which
was scanned. I have for example manuals for one of my systems (from east
germany) which where printed really bad. Also the paper was such a bad
quality - it became extremly yellow now. One example (if you care):
http://files.pofo.de/066.png
http://files.pofo.de/056.png
Oh... Pages in a really bad shape :(
I don't thinkthat there is any OCR Software which
can cope with that ;)
No way... That's where (at least for now) only a human can help. And
to be honest, I would have a hard time reading the lower part of
http://files.pofo.de/056.png for example...
Because I hate it not being able to search through
documents (thats why I
like online manuals - full text search) I ended up transcripting the
manuals and invested a massive amount of time in that.... ;)
Check out pages 58 and 68 in that:
http://pofo.de/P8000/notes/books/Einfuehrung_in_die_Software/1986_12/Einfue…
That's great work! What kind of workflow did you use to create the
PDF? What tools were involved? I think that, by doing that, you
already learned a lot the hard way which others might learn from.
And the whole pdf is smaller than a single png (I know
png is not the
format which should be used, the original images are saved as TIFF);)
Sure--plain text compresses quite good. A distorted, partially
randomized image doesn't :)
MfG, JBG
--
Jan-Benedict Glaw jbglaw at lug-owl.de +49-172-7608481
Signature of: 17:45 <@Eimann> Hrm, das E90 hat keinen Lebenszeit Call-Time Counter
mehr
the second : 17:46 <@jbglaw> Eimann: Wof?r braucht man das?
17:46 <@jbglaw> Eimann: F?r mich ist an 'nem Handy wichtig, da?
ich mein
Gege?ber h?ren kann. Und da? mein Gegen?ber mich
versteht...
17:47 <@KrisK> jbglaw: was du meinst ist wodka.
17:47 <@KrisK> jbglaw: es klingelt und man h?rt stimmen