[
Lists Home |
Date Index |
Thread Index
]
Simon St.Laurent wrote:
>sean.mcgrath@propylon.com (Sean McGrath) writes:
>
>
>>Correctness or input fidelity - pick one - you cannot have both.
>>
>>
>
>Of course you can have both, if you haven't been lulled to sleep by
>chants of "Infoset, Infoset" or "XPath is the data model." Heck, you
>can even have both and deal with the PSVI, if you're that much of a
>masochist.
>
>When XML first appeared, it seemed important that parsers be small and
>easy to write. XML 1.0 gave parser writers escape hatches on a number
>of things, and developers frequently wrote to that minimum. XML 1.0
>locked some functionality in the parser, and developers never went to
>the effort of exposing it.
>
>Since then, we've built huge edificies of code on top of these parsers,
>but I haven't seen anyone go back to retrieve what was thrown away in
>the first round. The Desperate Perl Hacker has been quite thoroughly
>betrayed, first by XML 1.0, then by namespaces, then by a variety of
>other devices that further separated the text from its supposed meaning.
>
>There's nothing inherent in XML or in the languages used to process XML
>that requires this division. Java is plenty capable of providing text
>renditions to accompany events or objects, if anyone thinks it valuable.
>Perl, Python, C# - heck, I think I could do this in Pascal or AppleSoft
>BASIC if I really had to do it. The problem isn't the code - it's the
>will. It certainly takes extra effort.
>
>I've been poking at this for years now, stuffing bits of code between
>books and other projects. I wrote up pretty much my whole process at
>http://lists.xml.org/archives/xml-dev/200303/msg00568.html, and I'm
>finally reaching the point where a framework is emerging that supports
>text, events, and objects.
>
i am not sure what are exactly your requirements but XmlPull API
provides optional freature to enable exact roundtrip that i implemented
in MXP1. this can be used as an efficient lower level layer on top of
which higher layers of events, trees or whatever can be built.
>When I'm done, you'll be able to collect a series of parsing events into
>an object tree, play with the text, re-serialize that into a tree, and
>drop that tree into events. You'll be able to make changes to the
>events or the object tree and have your changes made with minimal impact
>on the original surrounding text - no need to obliterate all your entity
>references to make changes in a document.
>
>I'm not claiming that this framework will be the most efficient way to
>process XML, or that it will solve all problems. There's a huge amount
>of work yet to do (an XPath implementation is crucial, and I've not yet
>started that), and the primary interface for it is still through javadoc
>and code.
>
>I intend, however, to demonstrate that "you can have both", and
>hopefully other programmers will pick up on that and let more of us have
>the benefits of both.
>
and yes you can have both as by default this feature is turned off to
keep compatibility with XML as XML is better dealt on infoset level for
most of applications but when enabled you will not miss anything from
original XML input (if you *really* need to do roundtripping ...)
thanks,
alek
--
"Mr. Pauli, we in the audience are all agreed that your theory is crazy.
What divides us is whether it is crazy enough to be true." Niels H. D. Bohr
|