Text encoding Babel. Was Re: George Keremedjiev

Sean Conner spc at conman.org
Tue Nov 27 22:11:22 CST 2018


It was thus said that the Great Keelan Lightfoot via cctalk once stated:
> I'm a bit dense for weighing in on this as my first post, but what the heck.
> 
> Our problem isn't ASCII or Unicode, our problem is how we use computers.
> 
> Going back in time a bit, the first keyboards only recorded letters
> and spaces, even line breaks required manual intervention. As things
> developed, we upgraded our input capabilities a little bit (return
> keys! delete keys! arrow keys!), but then, some time before graphical
> displays came along, we stopped upgrading. We stopped increasing the
> capabilities of our input, and instead focused on kludges to make them
> do more. We created markup languages, modifier keys, and page
> description languages, all because our input devices and display
> devices lacked the ability to comprehend anything more than letters.
> Now we're in a position where we have computers with rich displays
> bolted to a keyboard that has remained unchanged for 150 years.

  Do you have anything in particular in mind?

> Unpopular opinion time: Markup languages are a kludge, relying on
> plain text to describe higher level concepts. TeX has held us back.
> It's a crutch so religiously embraced by the people that make our
> software that the concept of markup has come to be accepted "the way".
> I worked with some university students recently, who wasted a
> ridiculous amount of time learning to use LaTeX to document their
> projects. Many of them didn't even know that page layout software
> existed, they thought there was this broad valley in capabilities with
> TeX on one side, and Microsoft Word on the other. They didn't realize
> that there is a whole world of purpose built tools in between. Rather
> than working on developing and furthering our input capabilities,
> we've been focused on keeping them the same. Markup languages aren't
> the solution. They are a clumsy bridge between 150 year old input
> technology and modern display capabilities.
> 
> Bold or italic or underlined text shouldn't be a second class concept,
> they have meaning that can be lost when text is conveyed in
> circa-1868-plain-text. 

  But I can still load and read circa-1968-plain-text files without issue,
on a computer that didn't even exist at the time, using tools that didn't
exist at the time.  The same can't be said for a circa-1988-Microsoft-word
file.  It requires either the software of the time, or specialized software
that understands the format.

> I've read many letters that predate the
> invention of the typewriter, emphasis is often conveyed using
> underlines or darkened letters. We've drawn this arbitrary line in the
> sand, where only letters that can be typed on a typewriter are "text",
> Everything else is fluff that has been arbitrarily decided to convey
> no meaning. I think it's a safe argument to make that the primary
> reason we've painted ourselves into this unexpressive corner is
> because of a dogged insistence that we cling to the keyboard.

  There were conventions developed for typewriters to get around this. 
Underlining text indicated italicized text (if the typewriter didn't have
the capability---some did).

  In fact, typewriters have more flexibility than computers do even today. 
Within the restriction of a typewriter (only characters and spaces) you
could use the back-space key (which did not erase the previous
character) and re-type the same character to get a bold effect.  You could
back-space and hit the underscore to get underlined text.  You could
back-space and hit the ` key to get a grave accent, and the ' to get an
acute accent.  With a bit more fiddling with the back-space and adjusting
the paper via the platten, you could get umlauts (either via the . or '
keys).

  I think the original intent of the BS control character in ASCII was to
facilitate this behavior, but alas, nothing ever did.  Shame, it's a neat
concept.

> I like the C comment example; Why do I need to call out a comment with
> a special sequence of letters? Why can't a comment exist as a comment?

  The smart-ass answer is "because the compiler only looks at a stream of
text and needs a special marker" but I get the deeper question---is a plain
text file the only way to program?

  No.  There are other ways.  There are many attempts at so-called "visual
languages" but none of them have been used to any real extent.  Yes, there
are languages like Visual Basic or Smalltalk, but even with those, you still
type text for the computer to run.

  The only really alternative programming language I know of is Excel. 
Seriously.  That's about the closest thing you get to a comment existing as
a comment without special markers, because you don't include those as part
of the program (specifically, you will exclude those cells from the
computation least you get an error).

> Why is a comment a second class concept? When I take notes in the
> margin, I don't explicitly need to call them out as notes. This
> extends to strings, why do I need to use quotes? I know it's a string
> why can't the computer remember that too? 

  Actually, this reminds me of another "really out there" language---Color
Forth.  Symbols, constants, codes, strings, don't have special characters to
designate them, but *color*.  It's an interesting concept but that too, has
its downsides (especially relevant to color-blind or even the truly blind
programmers).

> Why do I have to use the
> capabilities of a typewriter to describe that to the computer? There
> seems to be confusion that computers are inherently text based. They
> are only that way because we program them and use them that way, and
> because we've done it the same way since the day of the teletype, and
> it's _how it's done._
> 
> "Classic" Macs are a great example of breaking this pattern. There was
> no way to force the computer into a text mode of operating, it didn't
> exist. Right down to the core the operating system was graphical. When
> you click an icon, the computer doesn't issue a text command, it
> doesn't call a function by name, it merely alters the flow of some
> binary stuff flowing through the CPU in response to some other bits
> changing. Yes, the program describing that was written in text, but
> that text is not what the computer is interpreting.

  The computer is still interpreting [1], but it's not text.  It's
interpreting a spatial location with an action (mouse button press) to do
something (I cop to this being a semantic slight-of-hand---sorry).

> I'm getting a bit philosophical, so I'll shut up now, but it's an
> interesting discussion.

  -spc (I'll agree to that)

[1]	In fact, it's interpreters all the way down.  The CPU is
	interpreting instructions from memory ...


More information about the cctalk mailing list