Re:
On Jun 24, 2011, at 1:12 PM, Fred Cisin wrote:
On Fri, 24 Jun 2011, Dan Gahlinger wrote:
> I have a few rather thick text printouts from the mid-1970's on 132
> ...
Probabilistic ranking can do quite a bit if set up
properly. For example,
what characters would be most likely after a 'Q'? ('U', period, comma,
or
space) What are the most likely characters following a space? (Hint:
AFTER A SPACE, it is NOT ETAOINSHRLDU)
The OCR software can start with a substantial DB of fonts.
Your last question, about the space, got me curious...
I took your email (including the embedded quote of Dan's email) and counted:
I took the liberty of fixing a few typos (e.g.: paper,standard was
changed to "paper, standard" (otherwise that "s" wouldn't be
counted),
and considered the first letter of a word after a paragraph or sentence
break still counted as being after a space.
I counted uppercase letters with their lowercase counterparts.
The most popular four letters were four of the five of "ETAOI":
t (# instances: 61)
i (# instances: 49)
a (# instances: 47)
o (# instances: 37)
c (# instances: 35)
w (# instances: 32)
b (# instances: 28)
s (# instances: 28)
f (# instances: 20)
p (# instances: 17)
d (# instances: 14)
e (# instances: 14)
n (# instances: 14)
l (# instances: 13)
r (# instances: 12)
h (# instances: 11)
g (# instances: 8)
m (# instances: 8)
u (# instances: 6)
j (# instances: 3)
k (# instances: 3)
q (# instances: 2)
But, I'm curious...what were the letters you thought would appear more frequently?
Stan (probably should be working instead :) Sieler