Text encoding Babel. Was Re: George Keremedjiev

Grant Taylor cctalk at gtaylor.tnetconsulting.net
Fri Nov 30 17:11:39 CST 2018


On 11/30/2018 03:57 PM, Sean Conner via cctalk wrote:
> There are several problems with this.  One, how many bits do you set 
> aside per character?  8?  16?  There are potentially an open ended set 
> of stylings that one might use.

I acknowledge that the idea I shared was incomplete and likely has 
shortcomings.  But I do think that it demonstrates a concept, which is 
what I was after.

> Second problem---where do you store such bits?  Not to imply this is a 
> bad idea, just that there are issues that need to be resolved with how 
> things are done today (how does this interact with UTF-8 for instance? 
> Or UCS-4?).

Ideally, I'd like to see UTF-8 / UTF-16 code points (?) for the 
different styles of a letter.  Not every letter (character ~> byte / 
double) needs the styling.  So I suspect that it would be better to 
judiciously place code points in the UTF-8 / UTF-16 space.

Sadly, when I try to search for "this", the letters aren't found in 
"𝑡ℎ𝑖𝑠 𝑖𝑠 𝑎 𝑠𝑡𝑟𝑖𝑛𝑔" or "𝘁𝗵𝗶𝘀 𝗶𝘀 𝗮 𝗰𝗼𝗺𝗺𝗲𝗻𝘁". 
Something that I think should work.

Also, storage of these letters can work just like it is in this email.  ;-)



-- 
Grant. . . .
unix || die


More information about the cctalk mailing list