OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Does SAX make sense?

[ Lists Home | Date Index | Thread Index ]

Jimmy,

zhengyu wrote:

>I have got a weird question in mind that I would like to toss it out.
>
>Suppose there is a way to offer DOM type interface with SAX kind of
>efficiency.
>
Matthew O'Donnnell and I have made a series of presentations on this 
particular issue. Our latest proposal is known as JITTs 
(Just-In-Time-Trees), and you can find presentations/papers at: the 
JITTs homepage, http://www.jitts.org or you can visit our homepage on 
overlapping markup at: http://www.sbl-site2.org/Overlap/.

The basic idea is that markup (and hence trees) are recognized as part 
of processing of a file and has no meaning for a parser until it has 
been told to recognize that particular markup token.

What would be required is to change the order of processing used by most 
(if not all XML parsers) to processing the DTD/Schema first and using 
the resulting tree as the basis for recognition of markup events by SAX. 
(The SAX module then only recognizing markup tokens in the tree.) The 
only problem with that approach that has been suggested to us involves 
directly nested elements, such as <div>blah, blah<div>blah, 
blah</div>blah, blah</div>, but the incidence of such markup is unknown.

The advantage to our approach is that a DomLite tree could be 
constructed that retains the unrecognized markup (unlike a SAX filter) 
and upon retreival of the container (recognized markup), the previously 
unrecognized markup could be processed for presentation to the user. 
Simulated tests of this type of processing indicates substantial gains 
in processing speed over traditional construction of full DOM trees. 
Another advantage is that it operates with standard XML syntax, unlike 
some proposals, such as LMNL, which has its own (non-XML) format.

>How long would it take for the new processing model to become really
>popular?
>  
>
Well, it has not become popular (yet!) but the rise of partial parsing 
XML parsers and the like indicate that the need for something more 
efficient than current processing models for XML. JITTs has been 
criticized because it makes well-formedness a question that is answered 
at the time of processing. Personally, I don't find well-formedness 
apart from recognition at the time of processing by a parser all that 
compelling (or even meaningful). There are substantial advantages to 
meeting the requirements of well-formedness as part of processing.

I think the first successful JITTs parser that can be applied to large 
documents, the usual posts to this list, "I have a 10 MB document and 
need to build a DOM tree...," will force a change in the current "markup 
recognition first, useful document processing later" approach. The whole 
point of markup was to enable the processing of documents, not to create 
artificial limitations to prevent it.

Patrick

>Jimmy
>----- Original Message -----
>From: "Karl Waclawek" <karl@waclawek.net>
>To: <xml-dev@lists.xml.org>
>Sent: Sunday, May 25, 2003 7:00 PM
>Subject: Re: [xml-dev] Does SAX make sense?
>
>
>  
>
>>>There are several implementations, but I don't know of any standard
>>>interface. I have been thinking that having a standard interface just
>>>for passing XPath expressions to an event parser would be great. Anyone
>>>know of a standard being worked, implementations, or interested in
>>>starting a working group? If so, I'm in.
>>>      
>>>
>>I am working on something similar, but much simpler right now.
>>My XPaths are just straight paths, or in other words,  element types.
>>
>>My initial plan was to build a DTD (or other schema) validator
>>(on top of SAX) which has callback hooks for custom validation
>>or processing. The callbacks are registered by the application
>>based on a path - but rather a path based on the schema object
>>model and not the document object model. Every node in the SOM
>>corresponds to a separate set of callbacks.
>>
>>So far I was not thinking of anything more complex, as I think
>>this would be quite an effort.
>>
>>Karl
>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>initiative of OASIS <http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://lists.xml.org/ob/adm.pl>
>>
>>    
>>
>
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>
>
>  
>

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Co-Editor, ISO 13250, Topic Maps -- Reference Model








 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS