Am 12 Aug 2004 11:58 meinte Jules Richardson:
On Thu, 2004-08-12 at 11:32, Hans Franke wrote:
> Am 11 Aug 2004 15:41 meinte Cini, Richard:
> > >>XML is platform neutral because it's basically ASCII, right?
> > Yes, true, but I think of XML more as a Web
technology requiring a complex
> > parsing engine.
> Naa, XML parsing is as simple as parsing any
other tagged format.
> You just start at the beginning of the data stream and wait for
> a tag start ('<'), identify the tab, and process the following
> information (until the closing tag) as needed. That's all. For
> my own little XML data storage I did an XML reader and writer in
> Applesoft Basic in a few dozend lines. That's all what's needed.
> Shure, if you want to do
super-dooper-crunch-every-thing readers,
> then it get's a bit more complex, but these are not realy needed
> for real world applications (aka the ones realy getting the data
> from a media or putting it back).
Absolutely. There's a few gotchas, like if the
opening tag ends with />
rather than > then you don't go scanning for a closing tag, and I seem
to recall having to do some strange things with character data fields
when parsing (plus there were probably caveats about escaping quotes in
attribute fields etc. - it's been a while)
Now, /> is already a closeing tab, so if you find a / right at
the end of an opening tag, you just go for the closeing function
with an empty string as parameter - I found that way more convinient
than the behaviour of some parsers that tell you that you've found
an empty tag.
Also adding CDATA secions is only necersary if you want to include
as is data, which is not to be parsed at all. In general that is
only needed if you include tags and/or entities within which are
not supposed to be parsed. Necersary when doing binary or XML ad
data. not realy necersary here, but again, it is easy to be found
and can be handled with a few lines of code.
Certainly nothing too complex though; error handling
of the XML format's
probably the bit that takes most of the effort (coping with malformed
data without crashing the parser or eating up memory etc.)
Again, not realy an issue. As for a reader, it only needs to understand
the structure as it is used. In such a real world context, error handling
is easy - as easy as for any other format. If some identifyer comes along
that does not fit at the current position: print an error message and stop
execution. That's all needed. We don't have the need here to work like an
HTML browser, which tries to display a somewhat readable page, no matter
how destroyed the tags are, hopeing thet the user may find some sense
within. We store and retrieve data. if the data is corrupted, we allert
the user, and let him the situation (or some 'checkfile' utility - but
that's already optional).
Gruss
H.
--
VCF Europa 6.0 am 30.April und 01.Mai 2005 in Muenchen
http://www.vcfe.org/