Regarding Manuals

Johnny Billquist bqt at update.uu.se
Sat Sep 26 15:28:09 CDT 2015


On 2015-09-26 12:16, Johnny Billquist wrote:
> On 2015-09-25 22:35, Al Kossow wrote:
>> I have been going back and applying OCR to the ones on bitsavers.
>> Are there some in particular that you have a problem with?
>
> Aha. I wasn't aware of that. I've downloaded copies many years ago that
> I've been keeping locally. I'll check out the current versions on
> bitsavers then.

Al, exactly how have they been OCRed? Looking at them, it would appear 
that what you see is still the bitmaps of all the pages, but then you 
have the basic text also available for selection/searching.

My issue with that is that the documents are huge, and the experience 
just scrolling through them is pretty bad.

Sadly I don't even remember what software I used for OCR about 10 years 
ago, but I had something for Windows back then, which actually figured 
out fonts and all, and created a plain Word document from the OCR 
process. That was a really nice piece of software, which preserved 
formatting, fonts and all. I have a short example of the results at 
http://www.update.uu.se/~bqt/Clarkson.pdf, which was just a scan of two 
pages from a book. I created the pdf from Word.
A process like that is what I'd like, except for figures, which needs to 
be kept as bitmaps, I suspect.

	Johnny

-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt at softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol


More information about the cctalk mailing list