OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] XML's Scylla and Charybdis - parse and regexp

[ Lists Home | Date Index | Thread Index ]

sean.mcgrath@propylon.com (Sean McGrath) writes:
>Correctness or input fidelity - pick one - you cannot have both.

Of course you can have both, if you haven't been lulled to sleep by
chants of "Infoset, Infoset" or "XPath is the data model."  Heck, you
can even have both and deal with the PSVI, if you're that much of a

When XML first appeared, it seemed important that parsers be small and
easy to write.  XML 1.0 gave parser writers escape hatches on a number
of things, and developers frequently wrote to that minimum.  XML 1.0
locked some functionality in the parser, and developers never went to
the effort of exposing it.

Since then, we've built huge edificies of code on top of these parsers,
but I haven't seen anyone go back to retrieve what was thrown away in
the first round.  The Desperate Perl Hacker has been quite thoroughly
betrayed, first by XML 1.0, then by namespaces, then by a variety of
other devices that further separated the text from its supposed meaning.

There's nothing inherent in XML or in the languages used to process XML
that requires this division.  Java is plenty capable of providing text
renditions to accompany events or objects, if anyone thinks it valuable.
Perl, Python, C# - heck, I think I could do this in Pascal or AppleSoft
BASIC if I really had to do it.  The problem isn't the code - it's the
will.  It certainly takes extra effort.

I've been poking at this for years now, stuffing bits of code between
books and other projects.  I wrote up pretty much my whole process at
http://lists.xml.org/archives/xml-dev/200303/msg00568.html, and I'm
finally reaching the point where a framework is emerging that supports
text, events, and objects.  

When I'm done, you'll be able to collect a series of parsing events into
an object tree, play with the text, re-serialize that into a tree, and
drop that tree into events.  You'll be able to make changes to the
events or the object tree and have your changes made with minimal impact
on the original surrounding text - no need to obliterate all your entity
references to make changes in a document.

I'm not claiming that this framework will be the most efficient way to
process XML, or that it will solve all problems.  There's a huge amount
of work yet to do (an XPath implementation is crucial, and I've not yet
started that), and the primary interface for it is still through javadoc
and code.  

I intend, however, to demonstrate that "you can have both", and
hopefully other programmers will pick up on that and let more of us have
the benefits of both.

Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com -- http://monasticxml.org


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS