OCR old software listing

31 Dec 2018

On Mon, 31 Dec 2018, Larry Kraemer via cctalk wrote:
...
  I used the libtiff-tools (Debian 8.x - 32 Bit) to
extract all 61 .TIF's
 from the Multipage .tif file.  While the .tif's look descent, and
 RasterVect shows the .tif properties to be Group 4 Fax (1bpp) with 5100
 x 6600 pixels - 300 DPI, I can't get tesseract 3.x, TextBridge Classic
 2.0, or Irfanview with KADMOS Plugin to OCR any of the .tif files, with
 descent results.  I'd expect an OCR of 85 to 90 % correct conversion to
 ASCII text. 
Software listings need more accuraacy than that.
How many wrong characters does it take for a program not to work?
"desCent" isn't good enough.
85 to 90 % correct is a character wrong in every 6 to 10 characters.
How many errors is that PER LINE?
"But, you can start with that, and just fix the errors, without retyping
the rest."  Doing it that way is a desCent into madness.
BTDT.  wore out the T-shirts.
A competent typist can retype the whole thing faster than fixing an error
in every six to ten characters.
Only if there is less than one error for every several hundred characters
does "patching it" save time for a competent typist.
In general, for a competent typist, the fastest way to reposition the
cursor to the next error in the line is to simply hit the keys of the
intervening letters.
It is NOT to move the cursor with the mouse, then put your hand back on
the keys to type a character.
Using cursor motion keys is no faster for a competent typist than hitting
the keys of the letters toskip over.
TIP: display the OCR'ed text that is to be corrected in a font that
exaggerates the difference between zero and the letter 'O', and between
one and lower case 'l'.  There are some programs that will attempt to
select those based on context.
--
Grumpy Ol' Fred                     cisin at xenosoft.com

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

OCR old software listing