Text encoding Babel. Was Re: George Keremedjiev

Grant Taylor cctalk at gtaylor.tnetconsulting.net
Fri Nov 30 16:22:51 CST 2018

On 11/30/2018 11:34 AM, Keelan Lightfoot via cctalk wrote:
> Thanks!


> Both. In the beginning we were content, because the keyboard was well 
> suited to the capabilities of the technology available at the time it was 
> invented. We didn't see a better way, because when compared to using a pen 
> and paper (for writing) or using toggle switches (to control a computer), 
> a keyboard was a significant improvement. It's the the explosive growth 
> and universal adoption of computers that has locked us in to the keyboard 
> as the standard.

*sigh*  The Steve G. from Security Now's comments about passwords not 
going away comes to mind and seems apropos for keyboards.

There are other, likely better, things out there.  But keyboards 
themselves aren't going to go away.

> I disagree with this; from a usability standpoint, control codes are 
> problematic. Either the user needs to memorize them, or software needs 
> to inject them at the appropriate times.


> There's technical problems too; when it comes to playing back a stream 
> of characters, control characters mean that it is impossible to just 
> start listening. It is difficult to fast forward and rewind in a file, 
> because the only way to determine the current state is to replay the 
> file up to that point.

Now I'm wondering about something akin to the differences in upper case 
and lower case.  Functionally the same code, just a different value in 
the 6th bit.

What if there were (functionally) additional bits that indicated various 
other (what I was calling) stylings?

I think that something along those lines could help avoid a concern I 
have.  Namely how do search for an A, what ever ""style it's in.  I 
think I could hypothetically search for bytes ~> words (characters) 
containing (xxxxxxxx xxxxxxxx) (xxxxxxxx) 01x00001 (assuming that the 
proceeding don't cares are set appropriately) and find any format of A, 
upper case, lower case, bold, italic, underline, strike through, etc.

The other think that the additional bit / flags could do is allow the 
bytes (words / characters) to be read mid-stream.

> Do you mean modal control codes? As in "everything after here is bold" 
> and "the bold stops here"?

Yes.  That's what I was thinking when I wrote that.

> We've gone backwards sadly. For a brief while, this kind of rich user 
> interface stuff was provided by the OS. A text box, regardless of 
> the application, would use the OS's text box control, and would have 
> a universal interface for rich text.


> But the growth of the web has resulted in an atavism. We're back to plain 
> text, and using markup to style our text.

I mostly agree.  But I do wonder how true that actually is, at least on 
a technical level.  I think the text input box can be enhanced to allow 
more than just plain text.

> If I want bold text in Slack, I have to use markup.  Facebook Messages and 
> YouTube comments also support markup, but the syntax is slightly different 
> between them.


> Back in 1991, If I wanted bold text in any application that supported 
> rich text on my SE/30, I hit command-B and I got bold text. Sure, there 
> are Javascript rich text editors that can be bolted on, but they all 
> have their own UI concepts, and they're all a trainwreck.

I believe that we can do better.

> In addition to crusty old computers, I also enjoy the company of three 
> crusty old Linotypes. In fact, that's what got me thinking about this 
> stuff in the first place. The Linotype keyboard has 90 keys, which 
> directly map to the 90 glyphs a Linotype can "render". The keyboard is 
> laid out in three qual sized sections: lowercase letters on the left, 
> uppercase on the right, with numbers and punctuation in the middle. 
> Push the button, and what's marked on the button is what ultimately 
> ends up on the page. Each Linotype mat (matrix; letter mold) has two 
> positions, which can be selected by flipping a little lever when they're 
> being assembled into a line. The two positions are almost always used 
> to select between two versions of a font; roman/bold or roman/italic 
> are the most common pairings.

Intriguing.  I have a vague mental image of what you're talking about 
after watching Linotype: The Film (http://www.linotypefilm.com).  I 
found it quite entertaining and informative.

> But what it means is that you can walk up to a machine with a half-typed 
> line in the assembler and immediately determine its state.  Any mats 
> set in the bold position are in a physically different position in 
> the assembler. The position of the switch tells you if you're typing 
> in bold or roman. When you push the 'A' key, you know an uppercase 'A' 
> in bold will be added to the line. Additionally, the position of that 
> switch can be verified without taking your eyes off of the copy. There 
> is no black magic, no spooky action at a distance.  The capabilities of 
> the machine are immediately apparent.

I was not aware of the physically different positions.  But either I 
don't remember, pick up on, or they didn't go into that in the Linotype 
film.  Being able to determine the current position without removing 
your eyes from copy does sound like a very good thing.

> I agree. I think that they're normal enough that they should exist 
> as their own code points in unicode. Our 'standard' coding treats 
> 'formatting' as optional. IOW, I agree more!


> Consciously omitted in the beginning, yes, otherwise typewriters would 
> have never been affordable enough to become mainstream. But that has led 
> to "plain text" becoming the de-facto standard.


> It's 2018, and I can't type italic text in this e-mail without 
> potentially causing some people problems, 𝘣𝘶𝘵 𝘐'𝘮 
> 𝘸𝘪𝘭𝘭𝘪𝘯𝘨 𝘵𝘰 𝘨𝘪𝘷𝘦 𝘪𝘵 𝘢 
> 𝘵𝘳𝘺.

It came through for me.  -  It even made it through my clipboard, sed, 
and fmt.  :-)

Aside:  I suspect that none of the (plain-text / non-MIME) digest 
subscribers will see that properly.  I've found out from Mark S., 
Mailman's maintainer, that UTF-8 isn't maintained in the plain-text / 
non-MIME digest.

> I agree. But if they're added as a touch screen, shoot me now.  "Haptics" 
> has mutated into "shakes when you touch it", instead of "you can feel 
> the button".

Don't get me started on how inferior a device shaking is compared to a 
buckling spring.

I have no doubt that things will end up on a touch screen.  I'm 
confident that some are already there.

> I have the little DigiStump Arduino thingy somewhere that I bought to 
> use for exactly that purpose! My goal is to create a Linotype style 
> keyboard, the middle bank of 30 keys tailored to my application, as was 
> often done with the Linotype. I have one Linotype with dedicated E13B 
> keys for setting the magnetic ink characters at the bottom of cheques.

That sounds intriguing.

> I find that having the extra glyphs readily available means that I use 
> them more in everyday communication; I use the trademark symbol (which is 
> only slightly buried on a mac keyboard) quite often to convey a sense of 
> sarcasm (i.e. "sure, we could do that, but there is no One True Way™, 
> regardless of what the sales people told you...").

Yep.  I find the same.

I also find that I want to use them more as I do use them.  Or rather, 
it seems as if I know that there is a solution (character) that is 
appropriate in this situation, and I know how to use it, I'm going to 
use it!  Sort of like adding words to your vocabulary.

> When my Linotype 2000 keyboard enters production, I'll let you know ;)

That sounds a little bit tongue-in-cheek.  But I think that you could 
make something like that happen.  I suspect that there are a number of 
people that would be interested in that.  Dare I say it... especially if 
it was USB connected.

> What if comment characters had their own unicode code points? A bit silly, 
> yes, but that's the lines I'm thinking along. It would allow me to put 
> comments right inside my code if I found myself stricken with such a 
> desire to produce prodigiously incomprehensible programming!

Using the methodology I laid out above, where I could still search for 
the base A character, sure.

> I'm going to lavish on the unicode for this example, so those of you 
> properly unequipped may not see this example:
> foo := 𝑡ℎ𝑖𝑠 𝑖𝑠 𝑎 𝑠𝑡𝑟𝑖𝑛𝑔 
> 𝘁𝗵𝗶𝘀 𝗶𝘀 𝗮 𝗰𝗼𝗺𝗺𝗲𝗻𝘁 
> printf(𝑡ℎ𝑒 𝑠𝑡𝑟𝑖𝑛𝑔 𝑖𝑠 ① 𝑖𝑠𝑛𝑡 
> 𝑡ℎ𝑎𝑡 𝑒𝑥𝑐𝑖𝑡𝑖𝑛𝑔, foo) if 𝘁𝗵𝗶𝘀 
> 𝗶𝘀 𝗮 𝗽𝗼𝗼𝗿𝗹𝘆 𝗽𝗹𝗮𝗰𝗲𝗱 
> 𝗰𝗼𝗺𝗺𝗲𝗻𝘁 foo == 𝑡ℎ𝑖𝑠 𝑖𝑠 
> 𝑎𝑙𝑠𝑜 𝑎 𝑠𝑡𝑟𝑖𝑛𝑔, 𝑏𝑢𝑡 𝑛𝑜𝑡 
> 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑜𝑛𝑒 { 𝘁𝗵𝗶𝘀 𝗶𝘀 
> 𝗮𝗹𝘀𝗼 𝗮 𝗰𝗼𝗺𝗺𝗲𝗻𝘁 ...

It works for me.  :-)

Aside: I tried to subscribe additional addresses to see both the MIME 
and plain text digest.  But I'm waiting on moderator approval.  :-/

> An atrocious example, but a good demonstration of my point. If I had a 
> toggle switch on my keyboard to switch between code, comment and string, 
> it would have been much simpler to construct too!


> I don't deny that they exist, but there are no significant applications 
> being developed with them.


> I'm not sure how VB puts things together behind the scenes. The example 
> I am more familiar with is HyperCard, where instead of the UI existing 
> in the code, the code exists in the UI as you describe. This of course 
> violates all the religious tenets of Model-View-Controller design (for 
> the most part, dogmatic adherence to that pattern has mainly served to 
> give us government software projects that die as multi-trillion dollar 
> failures before seeing the light of day. I am of course being a bit 
> facetious, but not entirely). Back on topic, the tools exist, but they 
> are often seen as toys and not serious software development tools. Are 
> we at the point where the compiler for a visual programming language is 
> written in the visual programming language?


Grant. . . .
unix || die

More information about the cctalk mailing list