If you OCR, always archive the bitmaps too - Re: Regarding Manuals

27 Sep 2015

Easier said than done but a big data solution could be applied to ocr.
Eventually Google 's big data ocr tech will be made available open source
to read pdfs.  Until then high quality bitmaps are the logical way to scan,
if you ask me.  Eventually throw all docs into a big data lake and let them
churn in there until statistical ocr is applied, a superior approach vs
font pattern matching alone.
Bill Degnan
twitter: billdeg
vintagecomputer.net
On Sep 27, 2015 12:22 PM, "Pontus Pihlgren" <pontus at update.uu.se>
wrote:
...
  On Sun, Sep 27, 2015 at 04:08:07PM +0200, Johnny
Billquist wrote:

 I don't have problems reading the current scans, as such. But when
 having ten of these open at the same time, and scrolling through
 them, it becomes obvious that the bitmaps are heavy. It can take a
 while for the screen to be updated. Not to mention the problems you
 sometimes hits with searching...

 It seems to me that a better tool could solve the issue. One that
 could display the OCR:ed content only and the scanned content
 only when desired, for instance when you suspect an error.
 Is there such a reader? Is the content organised to make it
 possible.
 /P

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

If you OCR, always archive the bitmaps too - Re: Regarding Manuals