Text encoding Babel. Was Re: George Keremedjiev
Grant Taylor
cctalk at gtaylor.tnetconsulting.net
Fri Nov 30 17:11:39 CST 2018
On 11/30/2018 03:57 PM, Sean Conner via cctalk wrote:
> There are several problems with this. One, how many bits do you set
> aside per character? 8? 16? There are potentially an open ended set
> of stylings that one might use.
I acknowledge that the idea I shared was incomplete and likely has
shortcomings. But I do think that it demonstrates a concept, which is
what I was after.
> Second problem---where do you store such bits? Not to imply this is a
> bad idea, just that there are issues that need to be resolved with how
> things are done today (how does this interact with UTF-8 for instance?
> Or UCS-4?).
Ideally, I'd like to see UTF-8 / UTF-16 code points (?) for the
different styles of a letter. Not every letter (character ~> byte /
double) needs the styling. So I suspect that it would be better to
judiciously place code points in the UTF-8 / UTF-16 space.
Sadly, when I try to search for "this", the letters aren't found in
"𝑡ℎ𝑖𝑠 𝑖𝑠 𝑎 𝑠𝑡𝑟𝑖𝑛𝑔" or "𝘁𝗵𝗶𝘀 𝗶𝘀 𝗮 𝗰𝗼𝗺𝗺𝗲𝗻𝘁".
Something that I think should work.
Also, storage of these letters can work just like it is in this email. ;-)
--
Grant. . . .
unix || die
More information about the cctalk
mailing list