Why TEXT must not be tolerated (Was: [personal] Re: PDF datasheets (was Re: Sources for 8b TTL keyboards(Keytronics))

24 Dec 2008

...
  >BTW, I do NOT see variations in word size or
character set, as being
 >relevant. 
On Wed, 24 Dec 2008, Tom Peters wrote:
I actually appreciate your pointing out to me that machine readable (and
reworkable) text is a throwback to more primitive times.  Please don't
feel that this rant is directed at you personally.

...
  HTML can be rendered so radically different on various
platforms. What
 would the offset measure? Bytes? Characters? (those two would be very
 different) Lines? How could you possibly compute that? Would a byte offset
 be meaningful? A byte offset that lands you in the middle of a script or
 code of some other type would be the same screen position as a byte offset
 that's hundreds of bytes (or thousands) different; how would that be
 useful? 
<RANT>
Do you see those issues as being ones that you could not resolve?
I think that you are capable of writing code that can identify specific
content within a work.  Word boundaries are hardly a significant project.
I have been talking about software that analyzes content.  Should
non-relative jumps be eliminated from code based on "what happens if it
happens to point to the middle of a script"?  No, the address for the
jump, just as the address of the pointer is created to point to what you
want it to point to.

...
  Ever see a web page rendered in Lynx? It's just
text and nothing more. 
Is ALL content "useless" if just text and nothing more?
Last night I googled and read a copy of "Cyrano DeBergerac".  I don't NEED
fancy rendering for everything.  Are the words not useful without control
of the font?

Text is intrinsically a sequence of "character"s.  Yes, if you are so
inclined, you COULD scan a rendering of text, and have a readable screen
display that is actually a graphics image with no practical access to the
underlying text.

When working with text, the basic unit is a "character"; a character count
can often/usually be handled by a BYTE count.
For this pointer to be USEFUL, it needs to work for text.  It does NOT
diminish its usefulness if it can not also point to Uncle Charlie's nose
in a photograph.

...
  The whole point of a web browser is that the content
is rendering engine
 independent. I think that some sort of offset is a step backwards and is
 unlikely to be implemented. 
The whole point of that is to RESTRICT content access to only the form in
which it gets rendered by the animal byproducts plant, and to PREVENT
access to the internal content at any point other than explicitly
"marked" by its creator.

Yes, the intrusion into the internal content, rather than pure consumerism
of the rendering is INDEED a step backwards.  I'm surprised that the
retrogressive behavior of Google is tolerated!  They store a URL and an
offset in their index files.  I'm advocating that inclusion of the
[OPTIONAL] offset tag within the URL improves the capabilities of "low
level" access, such as indexing.  Surely, we should immediately replace
ALL HTML of text with graphic images [in non-OCRable fonts] to put a stop
to that step backwards of access for indexing by those who are not the
original creators of the page.

Yes, I guess that I AM the only one who wants to access the internals of
content, not just watch the pretty pictures.  I am indeed guilty of that
most heinous of crimes, indexing and computer analysis of text.

--
Grumpy Ol' Fred (SEVERAL steps backwards)    		cisin at xenosoft.com

<pre>
Sorry, there is no </RANT>
</pre>

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Why TEXT must not be tolerated (Was: [personal] Re: PDF datasheets (was Re: Sources for 8b TTL keyboards(Keytronics))