Graham Toal wrote:
My only claim
to being *possibly* on-topic is the age
of the article (1974) and the fact that it inspired
many of the early phoneme-driven speech synthesizers
(Votrax, etc.).
Does anyone have access to a suitably good engineering
library with a copy of:
McIlroy, M D, "Synthetic English Speech by Rule",
Bell Telephone Labs, CSTR #14, 1973 (though I have
also seen it referenced as 1974!)
or:
Ainsworth, W A, "A System for Converting English Text
to Speech", IEEE Trans Audio & Electroacoustics AU-21 #3
pp 288-290, 1973
The former is far more interesting to me than the
latter :-(
Just maybe I have either or both; I'll check when I get home.
Excellent! I would be surprised if you *did*. I have
pretty much resigned myself to a trip to the university's
engineering library (though I think I will wait until some
of this heat and humidity disappears...)
You know about the old post on net.sources?
Have a look at some of the stuff in here:
http://www.gtoal.com/wordgames/text2speech/
This seems to be the NRL ruleset embellished (for use
with a spell-checker? -- if so, you might want to look
at things like double metaphone for the approach you
are/were taking...)
It's the same vintage, may be of interest.
Yes, I have a version of the NRL code that I translated
from SNOBOL ~25 years ago. But, it has the same
limitations
(in terms of pronunciation accuracy) as the original ruleset.
I was hoping a peek at McIlroy's and Ainsworth's rules
would shed some additional insights not immediately
discernible from the Elovitz et al. paper.
More modern synthesizers suffer from big time code bloat
(e.g., flite can easily grow to 10-20MB while executing;
festival ten times that...). For "doing things on the cheap"
you need to look back in time :-(
Also, I hacked the navy code around a bit to make it
more
accurate and to assist with using a large phonetic
word list. And to parameterize the tables from an editable
data file rather than being hard coded in the C source.
I'd already done that. As well as shrinking the tables
considerably (I think my table is less than 2.5KB including
delimiters, pointers, etc.). There are also other efficiency
hacks you can do to speed up the searches, etc.
The algorithm is considerably improved if you subject
the words
to TeX's hyphenation algorithm before applying the grapheme->phoneme
rewrite rules. Hyphenation points roughly correspond to phoneme
boundaries, and stop words like haphazard from sounding half-assed.
Ah, that's a clever idea! Though it depends on how much
overhead that adds to the complexity of the algorithm. I
am REALLY squeezing hard to get this, a klatt-style
synthesizer, OS, etc. into a small application specific
CPU core so every byte has to pay for itself :>
Thanks!
--don