Text encoding Babel.

Guy Dunphy guykd at optusnet.com.au
Fri Nov 30 19:53:43 CST 2018

Ouch, what was I thinking? Mentioning a project I fundamentally can't talk in detail about yet; not very smart.
Thus spawning a thread guaranteed to go chaotic. Sorrrry!

Also I've changed the title, since it's disrespectful to drag a deceased person's name along with this.

I've been busy a couple of days, didn't have time to follow the thread. Still busy, but briefly with extracts:

@ Keelan Lightfoot
>  Our problem isn't ASCII or Unicode, our problem is how we use computers.
>  Markup languages are a kludge, relying on plain text to describe higher level concepts.
 [snip lots]

Nice post, and I agree with all of it. This is the type of thinking needed, and in general much like my approach. Except I'm a software and hardware designer, synthesist, and pursue practical results. Or at least _try_ to.
Funny you mention keyboards, as that's one of the project's bootstrapping steps. First a simulated keyboard (html & js initially) to allow free experimentation, later an open hardware design suitable for makers, 3D printing, etc. The crappyness of commercial keyboards is a bugbear of mine. Keyboards should be MUCH better than they are. And last forever.

@ Grant Taylor & Toby Thain
>>  · bold
>>  · italic
>>  · overline
>>  · strike through
>>  · underline
>>  · superscript exclusive or subscript
>>  · uppercase exclusive or lowercase
>>  · opposing case
>>  · normal (none of the above)
>This covers only a small fraction of the Latin-centric typographic
>palette - much of which has existed for 500 years in print (non-Latin
>much older). Computerisation has only impoverished that palette, and
>this is how it happens: Checklists instead of research.
>Work with typographers when trying to represent typography in a
>computer. The late Hermann Zapf was Knuth's close friend. That's the
>kind of expertise you need on your team.

More generally, an encoding standard needs to allow for ANY kind of present and future characters, fonts and modifiers.
But even more critically, it has to allow for such things without reference to 'central standards groups'. Enforced centralism is poison. For instance Unicode, and that vast table of symbols - that still doesn't include decent arrows (and many other needs.) What's required is a way for any bunch of people to be able to define their own character sets, fonts, adornments, etc, create definition files for them, and use those among themselves. Either embedded in documents or used as referenced defaults - both must be possible.  It is easy enough to define a base encoding that allows this. And in which legacy coding (ASCII, Unicode, etc) is one of the available defaults.

The point with embedding such capabilities in the base coding scheme, and then building the superstructure of computing language and OS on top of that, is to achieve a scheme in which human language and typesetting freedom is available through the entire structure. 

@ Cameron Kaiser
>> Surely a Chinese or Japanese based programming language could be 
>> developed.

>The Tomy Pyuuta has a very limited BASIC variant called G-BASIC which has
>Japanese keywords and is programmed with katakana characters (such as "kake" ...

Exactly, except it should be possible for any group (eg who speak whatever language) to modify existing computer language to their own human dialect. With compilers and assemblers this is not trivial, but with dictionary-based interpreters it's much easier. The keywords and operators are all just looked up in tables to achieve effects, and what characters or ideograms serve as the keywords are entirely flexible.
Then imagine one interpretted scripting language, that serves multiple functions: document layout, user apps and OS scripting. And that scripting language can be phrased in any human language, AND includes full typsetting of itself.

@ Liam Proven
> There are a wider panoply of options to consider.
> Try to collapse all these into one and you're doomed.

Lots of great references, thanks! As for doomed... well we'll see. I think the trick is to merely provide a mechanism for including extensible classes of 'stuff' in the base coding. Because being rigid about the mechanics of the higher level capabilities really is fatal. Fortunately, 'flexible extensibility' isn't so hard to do. Especially when you have a bunch of disused legacy control codes to work with.

At 02:34 PM 28/11/2018 -0700, Jim Manley wrote:
>Some computing economics history:
>I'm an engineer and scientist by both education and experience, 
>A theoretically "superior" encoding may
>not see practical use by a significant number of people because of legacy
>inertia that often makes no sense, but is rooted in cultural, sociological,
>emotional, and other factors, including economics.

Yep. I'm intensely aware of the economics and inertia factors. Points:
1. The ASCII-replacement coding is just a part of a wider project.

2. It's all a private project, for fun.

3. And yet there's a convergence of developments suggesting an opportunity in near future.
 MS/Intel are bastardizing, backdooring and box-closing the Wintel platform into something so evil even non-technical people are getting sick of it. This will continue, due to political agenda of MS/Intel.
 Simultaneously the competing Linux world is fragmenting into churn-chaos. (Complex but irreversible reasons.)
 Apple is... Apple. Becoming a platformm based mostly on virtue signalling, and increasingly as bad as Wintel.

4. If it ever is released, it will be freeware, open hardware and copylefted. DRM specifically banned from the platform. With many quite appealing wow-factors, several of which will be totally killer. It is not politically possible for MS/Intel/Apple to follow this path.


>Logic and reasoning are
>simply nowhere near enough to create the conditions necessary for
>widespread adoption - sometimes it's just good luck in timing (or, bad
>luck, as the case may be).

Absolutely. It's mostly about politics and meme-crafting.  Ref: Marx, L Ron Hubbard,
Mao, various religions, etc. Odd isn't it - so few instances of memetic weavers who
used their skills for the benefit of humankind. As opposed to those guys above, who
were all arseholes with pretty twisted objectives. Did you know L Ron Hubbard created
Scientology to win a drunken bet in a bar? Someone said "I bet you can't create a
religion!" And L Ron said "I bet I can!"

>ASCII was developed in an age when Teletypes ...


>You can't blame the ASCII developers for lack of foresight when no one in
>their right mind back then would have ever predicted we could have upwards
>of a trillion bytes of memory in our pockets ...

Absolutely. ASCII was a godsend at the time and I take pains to make this clear in the proposal docs. This is a _hindsight_ refactoring.

>Someone thinking that they're going to make oodles of money from some
>supposedly new-and-improved proprietary encoding "standard" that discards
>five-plus decades of legacy intellectual and economic investment, is
>pursuing a fool's errand.

Ha ha, I don't intend to even try to make any money from this. Other objectives.
Though, I'd probably set up a donations channel. Just in case people like it.

>  Even companies with resources at the level of
>Apple, Google, Microsoft, etc., aren't that arrogant, and they've
>demonstrated some pretty heavy-duty chutzpah over time.  BTW, you won't be
>able to patent what apparently amounts to a lookup table, and even if you
>copyright it,

Patents and copyright are poisons that are crippling intellectual and technological progress. The original concepts were OK, but got over-extended by greed (and still getting worse.) Patents in particular have become a tool for big corporate suppression of any potential competition, while copyright is used to destroy free expression. The entire DRM/copyright legal framework should be nullified.
This project will be intentionally copyright and patent excluding. Freeware, published, open source, open hardware, etc. Just a conformance symbol, which certifies (among other things) that _nothing_ in the systems & software is under any kind of DRM restriction. People buy or build such a system, they own it entirely.
This is why I can't mention details or coined terminology now.

>True standards are open nowadays - the days of proprietary "standards" are

Except that by 'open' they usually mean you can pay a lot of money for a copy of the standard doc.
That's not what I call 'open.'

>a couple of decades behind us - even Microsoft has been publishing the
>binary structure of their Office document file formats.  The specification
>for Word, that includes everything going back to v 1.0, is humongous, and
>even they were having fits trying to maintain the total spec, which is
>reportedly why they went with XML to create the .docx, .xlsx, .pptx, etc.,
>formats.  That also happened to make it possible to placate governments
>(not to mention customers) that are looking for any hint of
>anti-competitive behavior, and thus also made it easier for projects such
>as OpenOffice and LibreOffice to flourish.
>Typographical bigots, who are more interested in style than content, were
>safely fenced off in the back rooms of publishing houses and printing
>plants until Apple released the hounds on an unsuspecting public.  I'm
>actually surprised that the style purists haven't forced Smell-o-Vision
>technology on The Rest of Us to ensure that the musty smell of old books is
>part of every reading "experience" (I can't stand the current common use of
>that word).  At least I have the software chops to transform the visual
>trash that passes for "style" these days into something pleasing to _my_
>eyes (see what I did there with "severely-flawed" ASCII?  Here's how you
>can do /italics/ and !bold! BTW.).

Oh yes, tell me about it. 'Do it this way' bigots of all kinds. Pick any possible thing that can be done more than one way, and there will be camps of fanatics insisting their one way is the true way and all others are crazy.
Finding such artificial dichotomies (or n-way splits) has been a very rich source of inspiration for holistic rethinking.

Btw, again I'll emphasize that when I say ASCII is severely flawed, I mean this in the context of what we know now about information coding requirements, and creating extensible systems. It was't 'severely flawed' back when it was created.

>Nothing frosts me more than reading text that can't be resized and
>auto-reflowed, especially on mobile devices with extremely limited display
>real estate.  I'm fully able-bodied and I'm perturbed by such bad design,
>so, I'm pretty sure that pages that prevent pinch-zooming, and that don't
>allow for direct on-display text resizing/auto-reflow, violate the spirit
>completely, if not virtually all of the letters, of the Americans with
>Disabilities Act (and similar legislation outside the U.S., I imagine).

Well, there's more than that one requirement. If one wanted to capture a historical document, the absolute image of the page(s) is a core aspect, and can't be 'reflowed'. But otoh, the text content should be accessible as a searchable and reflowing character stream. A decent coding scheme will support both objectives simultaneously.

Btw I'm constantly amazed by how badly tech docs are being 'digitized' even now. Service manuals with fold out schematics, screened tonal multi-colour illustrations etc... just endless awful digital copy fails. Meanwhile the original paper copies get rarer and rarer, because idiots think 'those are all online now, paper copies are obsolete', and throw them out.

@ Keelan Lightfoot
>from a usability standpoint, control codes are
problematic. Either the user needs to memorize them, or software needs
to inject them at the appropriate times.

You're thinking of 'control codes' as something you type by holding down CTRL and some other key. Yes, these are a pain and I personally hate UI's that depend on memorising lots of them.
But strictly speaking 'control codes' are the byte codes 0x00 to 0x1F, in the ASCII table. Most of which are now little used apart from in hardware protocols. How those would be brought into use in an ASCII-replacement and new UI, is another topic. Sadly, part of the area I won't talk about. Just bear in mind that this system includes new keyboard designs, and 'things that have to be memorised' are fine for some people but not for others (including me.)

Ha ha, even ctrl-C and ctrl-V for cut and paste are a pain, not because they must be memorised, but because the ergonomics of distorting the fingers to type them, is horrible for such a common action. Stuff like this...
Oh, and if you are wondering if I'm imagining some huge keyboard with even more keys, no. Personally I use a short ('10-keyless') keyboard, and don't want to ever have to go back to stupidly big keyboards.

>In addition to crusty old computers, I also enjoy the company of three
crusty old Linotypes. In fact, that's what got me thinking about this
stuff in the first place.

Ah, I am intensely jealous! I wish I could find an old but working linotype. And someone to teach me how to use it. Hot lead, yeah! (I used to cast things in lead as a child, have done bronze casting and intend to do more.)
I have some exposure to typesetting & printing; enough to know how much I don't know. Some articles on related topics are in-progress, but not yet posted.

Anyway, back on topic (classic computing.) Here's an ascii chart with some control codes highlighted.

I'm collecting all I can find on past (and present) uses of the control codes. Especially the ones highlighed in orange. Not having a lot of success in finding detailed explanations, beyond very brief summaries in old textbooks.

Note that I'm mostly interested in code interpretations in communications protocols. Their use in local file encodings not so much, since those are the domain of legacy application software and wouldn't clash with redefinition of what the codes do, in future applications.

And now, back to machining a lock pick for a PDP-8/S front panel cylinder lock.


More information about the cctalk mailing list