Text encoding Babel. Was Re: George Keremedjiev

ben bfranchuk at jetnet.ab.ca
Sun Nov 25 20:07:35 CST 2018

On 11/25/2018 6:34 PM, Fred Cisin via cctalk wrote:
> On Mon, 26 Nov 2018, Tomasz Rola via cctalk wrote:
>> To supply this train of thought with some numbers:
>> - my copy of Common Lisp HyperSpec claims 978 symbols (i.e. words) on
>>   its alphabetical index; many words have modifiers (a.k.a. keyword
>>   options, with default values) which increases the number at least
>>   twofold, IMHO, if one agrees that each combo should be counted as
>>   different word, to which I would say yes
>> - I have read somewhere that Japanese pupil after graduating from
>>   elementary school is supposed to know 1000 kanjis by heart (there
>>   is a standardised set, I have a book)
> Would those "modifiers of words" qualify as ADJECTIVES?
> The Japanes phonetic alphabets, Katakana and Hirigana, have 46 letters 
> each, almost twice that with diacritics.
> I have heard that Japanese Kanji has more than 50,000 words/characters 
> (for which 16bits would fit, but be a little risky).  But, that in 
> common usage, 1100 to 2000 words comprise most of common usage.  
> Wikipedia says that as of 2010, the student requirement is 2136.
> Japanese Kanji and Chinese have substantial overlap, but there is no way 
> that you could squeeze both into 16 bits, without leaving out important 
> stuff.
> Therefore, for use with current computers, 32 bits would be needed.
> Some games can be played with mixing sizes by doing things like setting 
> high bit, for 128 7 bit characters plus 32768 15 bit characters, and 
> 2147483648 31 bit characters.


More information about the cctalk mailing list