OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Partial documents in tree-based APIs

[ Lists Home | Date Index | Thread Index ]


i think this is generally called "pull" DOM and i believe it was 
implemented in Python first (!?).

in Java i wrote XPP2 (available since August 2001) that as part of 
implementation has XmlPullNode that allows not only to build 
incrementally tree but even to go back to underlying parser and return 
to parsing of XML directly for parts of tree (or skipping parts of XML 
that are not needed).

this is very flexible and powerful approach to process and dispatch XML 
messages. as parsing _and_ tree building can start as soon as first XML 
start tag is received and is easily monitored so application builds as 
much of XML tree as needed it has very positive implications for 
performance, see results for XPP pull when compared with other XML tree 
APIs in Java: 

i work now on XB1 that is direct successor of XPP2 XmlPullNode but has 
easier to use API (XPP2 was very minimal even  _ascetic_ API)  and 
models directly XML infoset in Java.



Elliotte Rusty Harold wrote:

> Streaming APIs like SAX and XMLPULL by their nature provide some of 
> the content of a malformed document to the client application before 
> the first well-formedness error is detected. The XML specification 
> implicitly says this is OK, though in some use-cases roll-back or 
> failure to commit may be desirable.
> Now consider the case of a tree-based API such as DOM, JDOM, or XOM 
> which encounters a malformedness error. Traditionally, these APIs have 
> reported no information from a malformed document to the client 
> application. However, recently Laurent Bihanic submitted a patch to 
> JDOM in which as much of the document as had been able to be 
> successfully parsed was made available through the exception that was 
> thrown to indicate the malformedness error. This was quite clever. It 
> had never occurred to me, and I had never noticed any other API do 
> anything similar.
> What I'd like to get broader discussion of is whether this is a good 
> idea. There are certainly use cases for it. Bihanic wanted to read the 
> envelope of an XML message even if the data was malformed. However, 
> there are also problems. For instance, if the well-formedness error is 
> a missing end-tag, then the element with the missing end-tag will 
> still appear in the partial tree. And if the problem is a missing root 
> element, then this may produce a Document object with no root element. 
> On the other hand, rollback, failure to commit, or simply ignoring the 
> malformed document is much easier than with a streaming API since you 
> know in advance that the document is malformed.
> Is this approach something to be encouraged? Should other tree-based 
> APIs like XOM and DOM copy this innovation? What advantages and 
> disadvantages have I not thought of?

"Mr. Pauli, we in the audience are all agreed that your theory is crazy. 
What divides us is whether it is crazy enough to be true." Niels H. D. Bohr


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS