Text encoding Babel. Was Re: George Keremedjiev

Keelan Lightfoot keelanlightfoot at gmail.com
Tue Nov 27 17:43:51 CST 2018


I'm a bit dense for weighing in on this as my first post, but what the heck.

Our problem isn't ASCII or Unicode, our problem is how we use computers.

Going back in time a bit, the first keyboards only recorded letters
and spaces, even line breaks required manual intervention. As things
developed, we upgraded our input capabilities a little bit (return
keys! delete keys! arrow keys!), but then, some time before graphical
displays came along, we stopped upgrading. We stopped increasing the
capabilities of our input, and instead focused on kludges to make them
do more. We created markup languages, modifier keys, and page
description languages, all because our input devices and display
devices lacked the ability to comprehend anything more than letters.
Now we're in a position where we have computers with rich displays
bolted to a keyboard that has remained unchanged for 150 years.

Unpopular opinion time: Markup languages are a kludge, relying on
plain text to describe higher level concepts. TeX has held us back.
It's a crutch so religiously embraced by the people that make our
software that the concept of markup has come to be accepted "the way".
I worked with some university students recently, who wasted a
ridiculous amount of time learning to use LaTeX to document their
projects. Many of them didn't even know that page layout software
existed, they thought there was this broad valley in capabilities with
TeX on one side, and Microsoft Word on the other. They didn't realize
that there is a whole world of purpose built tools in between. Rather
than working on developing and furthering our input capabilities,
we've been focused on keeping them the same. Markup languages aren't
the solution. They are a clumsy bridge between 150 year old input
technology and modern display capabilities.

Bold or italic or underlined text shouldn't be a second class concept,
they have meaning that can be lost when text is conveyed in
circa-1868-plain-text. I've read many letters that predate the
invention of the typewriter, emphasis is often conveyed using
underlines or darkened letters. We've drawn this arbitrary line in the
sand, where only letters that can be typed on a typewriter are "text",
Everything else is fluff that has been arbitrarily decided to convey
no meaning. I think it's a safe argument to make that the primary
reason we've painted ourselves into this unexpressive corner is
because of a dogged insistence that we cling to the keyboard.

I like the C comment example; Why do I need to call out a comment with
a special sequence of letters? Why can't a comment exist as a comment?
Why is a comment a second class concept? When I take notes in the
margin, I don't explicitly need to call them out as notes. This
extends to strings, why do I need to use quotes? I know it's a string
why can't the computer remember that too? Why do I have to use the
capabilities of a typewriter to describe that to the computer? There
seems to be confusion that computers are inherently text based. They
are only that way because we program them and use them that way, and
because we've done it the same way since the day of the teletype, and
it's _how it's done._

"Classic" Macs are a great example of breaking this pattern. There was
no way to force the computer into a text mode of operating, it didn't
exist. Right down to the core the operating system was graphical. When
you click an icon, the computer doesn't issue a text command, it
doesn't call a function by name, it merely alters the flow of some
binary stuff flowing through the CPU in response to some other bits
changing. Yes, the program describing that was written in text, but
that text is not what the computer is interpreting.

I'm getting a bit philosophical, so I'll shut up now, but it's an
interesting discussion.

- Keelan


More information about the cctalk mailing list