Text encoding Babel. Was Re: George Keremedjiev

Guy Dunphy guykd at optusnet.com.au
Thu Nov 22 18:55:18 CST 2018


At 10:33 PM 21/11/2018 -0500, ED SHARPE wrote:
>if I type an extra space I am sure every one sees it. but the chars not everyone sees them. 
>what I do figure us the older email programs are not accepting of all charter sets? ( dunno if I am using the right term)
>
>Sent from AOL Mobile Mail

Ah ha! Mystery explained. I'm another who sees funny characters where Ed's mails contain "c2 a0".
This is the UTF-8 encoding of a 'no-break space' character, which is NOT in the original ASCII set.
See https://apps.timwhitlock.info/unicode/inspect/hex/c2/a0

I see them because I'm using an old email client - Eudora 3 (1997.) I stick with this specifically
_because_ it doesn't understand UTF-8 or any other non-ASCII coding, especially in the header, and
hence simply ignores any executables in the headers or email body. Which makes it totally virus proof,
unlike Microsoft's intentionally open-backdoor junk like Outlook. And most other email 'modern wonders.'
Eudora barely even understands html in emails, and I'm fine with that. Also I have it configured to
dust-bin any incomimg mail containing UTF-8 chars in the Subject header. Avoids a lot of time-wasting.

Anyway, I was wondering how Ed's emails (and sometimes others elsewhere) acquired that odd corruption.
Answer: Ed's email util (AOL Mobile Mail, and probably various other 'content enhanced' email clients)
interpret the user typing space twice in succession, as meaning "I really, really want there to be a space
here, no matter what." So it inserts a 'no-break space' unicode character, which of course requires a
2-byte UTF-8 encoding. Then adds a plain ASCII space 0x20 just to be sure.

Personally I find it more interesting than annoying. Just another example of the gradual chaotic devolution
of ASCII, into a Babel of incompatible encodings. Not that ASCII was all that great in the first place.
It's also interesting that even on cctalk, where you'd think everyone would be aware of the differences
between ASCII and later 'extensions', low level coding schemes, and the desirability of sticking to common
standards, some are not.

Takeaway: Ed, one space is enough. I don't know how you got the idea people might miss seeing a
single space, and so you need to type two or more. But it isn't so. The normal convention in plain
text is one space character between each word. And since plain ASCII is hard-formatted, extra spaces
are NOT ignored and make for wider spacing between words. Which  looks    very       odd, even if
your mail utility didn't try to do something 'special' with your unusual user input.


Btw, I changed the subject line, because this is a wider topic. I've been meaning to start a conversation
about the original evolution of ASCII, and various extensions. Related to a side project of mine.

But first, I'm having a problem with some portion of cctalk posts going missing, ie I don't receive all messages.
The ratio seems to vary day to day. Sometimes no obvious missing, sometimes a lot.
Still don't know why, or how to fix this. Any suggestions?

Guy



>On Wednesday, November 21, 2018 Fred Cisin <cisin at xenosoft.com> wrote:
>Ed,
>It is YOUR mail program that is doing the extraneous insertions, and 
>then not showing them to you when you view your own messages.
>
>ALL of us see either extraneous characters, or extraneous spaces in 
>everything that you send!
>I use PINE in a shell account, and they show up as a whole bunch of 
>inappropriate spaces.
>
>Seriously, YOUR mail program is inserting extraneous stuff.
>Everybody? but you sees it.
>
>> who  knows?   what  mail program  are  you using that   does that?
>It is YOUR mail program that is "doing that"!!
>
>
>On Wed, 21 Nov 2018, ED SHARPE via cctalk wrote:
>
>> who  knows?   what  mail program  are  you using that   does that?
>>
>>
>> In a message dated 11/21/2018 1:25:08 PM US Mountain Standard Time, cctalk at classiccmp.org writes:
>>
>>  
>> At 02:03 PM 11/21/2018, ED SHARPE via cctalk wrote:
>>
>>> I sold him my extra classic 8 with the plexi covers on it... sn 200 series.... we kept sn #18
>>
>> Side question: What process is turning non-blanking spaces into ISO-8859-1
>> circumflex-A for you?
>>
>> I see 'Â' all throughout your emails.
>>
>> - John
>
>


More information about the cctalk mailing list