Text encoding Babel. Was Re: George Keremedjiev
Fred Cisin
cisin at xenosoft.com
Sun Nov 25 19:34:36 CST 2018
On Mon, 26 Nov 2018, Tomasz Rola via cctalk wrote:
> To supply this train of thought with some numbers:
>
> - my copy of Common Lisp HyperSpec claims 978 symbols (i.e. words) on
> its alphabetical index; many words have modifiers (a.k.a. keyword
> options, with default values) which increases the number at least
> twofold, IMHO, if one agrees that each combo should be counted as
> different word, to which I would say yes
>
> - I have read somewhere that Japanese pupil after graduating from
> elementary school is supposed to know 1000 kanjis by heart (there
> is a standardised set, I have a book)
Would those "modifiers of words" qualify as ADJECTIVES?
The Japanes phonetic alphabets, Katakana and Hirigana, have 46 letters
each, almost twice that with diacritics.
I have heard that Japanese Kanji has more than 50,000 words/characters
(for which 16bits would fit, but be a little risky). But, that in common
usage, 1100 to 2000 words comprise most of common usage. Wikipedia says
that as of 2010, the student requirement is 2136.
Japanese Kanji and Chinese have substantial overlap, but there is no way
that you could squeeze both into 16 bits, without leaving out important
stuff.
Therefore, for use with current computers, 32 bits would be needed.
Some games can be played with mixing sizes by doing things like setting
high bit, for 128 7 bit characters plus 32768 15 bit characters, and
2147483648 31 bit characters.
More information about the cctalk
mailing list