Text encoding Babel. Was Re: George Keremedjiev

Grant Taylor cctalk at gtaylor.tnetconsulting.net
Tue Nov 27 19:33:21 CST 2018


On 11/27/2018 04:43 PM, Keelan Lightfoot via cctalk wrote:
> I'm a bit dense for weighing in on this as my first post, but what 
> the heck.

Welcome.  :-)

> Our problem isn't ASCII or Unicode, our problem is how we use computers.

Okay.

> Going back in time a bit, the first keyboards only recorded letters 
> and spaces, even line breaks required manual intervention. As things 
> developed, we upgraded our input capabilities a little bit (return 
> keys! delete keys! arrow keys!), but then, some time before graphical 
> displays came along, we stopped upgrading. We stopped increasing the 
> capabilities of our input, and instead focused on kludges to make them do 
> more.

Do you think that we stopped enhancing the user input experience more 
because we were content with what we had or because we didn't see a 
better way to do what we wanted to do?

> We created markup languages, modifier keys, and page description 
> languages, all because our input devices and display devices lacked 
> the ability to comprehend anything more than letters.  Now we're in a 
> position where we have computers with rich displays bolted to a keyboard 
> that has remained unchanged for 150 years.

Hum....

> Unpopular opinion time: Markup languages are a kludge, relying on plain 
> text to describe higher level concepts.

I agree that markup languages are a kludge.  But I don't know that they 
require plain text to describe higher level concepts.

I see no reason that we can't have new control codes to convey new 
concepts if they are needed.

Aside:  ASCII did what it needed to do at the time.  Times are different 
now.  We may need more / new / different control codes.

By control codes, I'm meaning a specific binary sequence that means a 
specific thing.  I think it needs to be standardized to be compatible 
with other things -or- it needs to be considered local and proprietary 
to an application.

> TeX has held us back.  It's a crutch so religiously embraced by the 
> people that make our software that the concept of markup has come to be 
> accepted "the way".  I worked with some university students recently, 
> who wasted a ridiculous amount of time learning to use LaTeX to document 
> their projects. Many of them didn't even know that page layout software 
> existed, they thought there was this broad valley in capabilities with 
> TeX on one side, and Microsoft Word on the other. They didn't realize 
> that there is a whole world of purpose built tools in between.

I actually wonder how much need there is for /all/ of those utilities. 
I expect that things should have streamlined and simplified, at least 
some, in the last 30 years.

> Rather than working on developing and furthering our input capabilities, 
> we've been focused on keeping them the same. Markup languages aren't the 
> solution. They are a clumsy bridge between 150 year old input technology 
> and modern display capabilities.

What would you like to do or see done differently?  Even if it turns out 
to be worse, it would still be something different and likely worth 
trying at least once.

> Bold or italic or underlined text shouldn't be a second class 
> concept, they have meaning that can be lost when text is conveyed in 
> circa-1868-plain-text. I've read many letters that predate the invention 
> of the typewriter, emphasis is often conveyed using underlines or darkened 
> letters.

I don't think of bold or italic or underline as second class concepts. 
I tend to think of the following attributes that can be applied to text:

  · bold
  · italic
  · overline
  · strike through
  · underline
  · superscript exclusive or subscript
  · uppercase exclusive or lowercase
  · opposing case
  · normal (none of the above)

I don't think that normal is superior to the other four (five) in any 
way.  I do think that normal does occur VASTLY more frequently than the 
any combination of the others.  As such normal is what things default to 
as an optimization.  IMHO that optimization does not relegate the other 
styles to second class.

> We've drawn this arbitrary line in the sand, where only letters that 
> can be typed on a typewriter are "text", Everything else is fluff that 
> has been arbitrarily decided to convey no meaning.

I don't agree that the decision was made (by most people).  At least not 
consciously.

I will say that some people probably decided what a minimum viable 
product is when selling typewriters, and consciously chose to omit the 
other options.

> I think it's a safe argument to make that the primary reason we've 
> painted ourselves into this unexpressive corner is because of a dogged 
> insistence that we cling to the keyboard.

I see no reason that the keyboard can't have keys / glyphs added to it.

I'm personally contemplating adding additional keys (via an add on 
keyboard) that are programmed to produce additional symbols.  I 
frequently use the following symbols and wish I had keys for easier 
access to them:  ≈, ·, ¢, ©, °, …, —, ≥, ∞, ‽, ≤, µ, ≠, Ω, ½, ¼, ⅓, ¶, 
±, ®, §, ¾, ™, ⅔, ¿, ⊕.

When I say frequently, I mean that I use some of them daily, many of 
them weekly, and others monthly.  I've written tiny shell scripts that I 
can run that insert the character into the clipboard so that I can 
easily paste the character where I need them.  (The order of the symbols 
above comes from the alphabetical nature of their names.)

Frequently enough that I'd like about ¼ of them to be on keys on my 
keyboard(s) that I can press.  Ideally, they would just be inserted 
directly instead of going through the clipboard.

How could I forget:

(╯°□°)╯︵ ┻━┻
¯\_(ツ)_/¯
┬─┬ ノ( ゜-゜ノ)

So … I'm in favor of extending the keyboard.  :-)

> I like the C comment example; Why do I need to call out a comment with 
> a special sequence of letters? Why can't a comment exist as a comment? 
> Why is a comment a second class concept? When I take notes in the 
> margin, I don't explicitly need to call them out as notes.

Ah, but you are explicitly calling them out by where you place them.

We need a way to tell the computer that something is a comment.  One 
method is to use markup / control key sequences.  Another method is to 
use what Google Docs uses, namely highlight text to comment, click the 
comment balloon, enter the comment in the comment box.  There's no key 
sequence.  But there is an explicit indication that something is a 
comment.  Said comment is displayed in the right hand margin, so I know 
it's a comment.

Note:  That's all the user interface and user experience.  It says 
nothing for the underlying storage method.  I'm getting that the 
document is a collection of JSON markup.  Or a "file format" as I've 
been saying.

> This extends to strings, why do I need to use quotes? I know it's a 
> string why can't the computer remember that too? Why do I have to use 
> the capabilities of a typewriter to describe that to the computer?

I don't have to tell m4 that something is a string or a number or 
anything else.  I simply define the thing, what ever I want to call it, 
and assign it a value.  Then anywhere I use that name, the name is 
substituted with the value that I assigned to it.

I believe that some language treat strings, arrays, hashes, etc. as 
separate namespaces.  Thus you must indicate which namespace you are 
meaning.  I don't see any reason that we can't have a common namespace 
and allow the named entity to include information about what type of 
entity it is.  —  This seems like a programming language design issue, 
not a human keyboard issue.

Or did I completely misinterpret what you meant by strings?

> There seems to be confusion that computers are inherently text based. They 
> are only that way because we program them and use them that way, and 
> because we've done it the same way since the day of the teletype, and 
> it's _how it's done._

I will concede that many computers and / or programming languages do 
behave based on text.  But I am fairly confident that there are some 
programming languages (I don't know about computers) that work 
differently.  Specifically, simple objects are included as part of the 
language and then more complex objects are built using the simpler 
objects.  Dia and (what I understand of) Minecraft come to mind.

I'm fairly confident that NeXT Step also did some things along these 
lines, but I'm not sure.

> "Classic" Macs are a great example of breaking this pattern. There was 
> no way to force the computer into a text mode of operating, it didn't 
> exist. Right down to the core the operating system was graphical. When 
> you click an icon, the computer doesn't issue a text command, it doesn't 
> call a function by name, it merely alters the flow of some binary stuff 
> flowing through the CPU in response to some other bits changing. Yes, 
> the program describing that was written in text, but that text is not 
> what the computer is interpreting.

I don't think that the text mode vs GUI mode is a good comparison.  Text 
mode happens to be common, likely because it's simpler to start with, 
but I don't see it as a requirement.

Visual Basic keeps coming to mind as I read your paragraph.  (From what 
I understand) VB is inherently visually / graphically oriented.  You 
start with the visual components, and then assign actions to various 
pieces.  Those actions are likely text.  But I see no reason that you 
couldn't extend the model to execute other things.

> I'm getting a bit philosophical,

I don't see any problem with that.

> so I'll shut up now,

Please don't do that.

> but it's an interesting discussion.

Yes, it is.  :-)



-- 
Grant. . . .
unix || di


More information about the cctalk mailing list