xml-dev - Re: [xml-dev] Does SAX make sense?

Re: [xml-dev] Does SAX make sense?

[ Lists Home | Date Index | Thread Index ]

To: Joshua Allen <joshuaa@microsoft.com>
Subject: Re: [xml-dev] Does SAX make sense?
From: Patrick Durusau <pdurusau@emory.edu>
Date: Fri, 30 May 2003 09:25:04 -0400
Cc: Patrick.Durusau@sbl-site.org, xml-dev@lists.xml.org, zhengyu<zhengyu@attbi.com>
References: <4F4182C71C1FDD4BA0937A7EB7B8B4C1061BAC5A@red-msg-08.redmond.corp.microsoft.com>
Reply-to: Patrick.Durusau@sbl-site.org
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02

Joshua,

Joshua Allen wrote:

>FWIW, this is exactly the sort of scenario we hoped to enable with the design of XPathNavigator in V1 of the .NET Frameworks (and I believe BEA has a similar thing called XmlCursor).
> 
>
Interesting and I can see how your approach would be quite useful.

Not exactly the same as the JITTs proposal as JITTs proposes that we 
change the underlying parsing of XML (while not changing XML syntax) to 
allow for declaration of what tokens are recognized as syntax and by 
extension, what trees are recognized (we say trees are "asserted" about 
documents). This solves rather neatly the problem of overlapping markup 
in a document.

Random access is a valuable step forward but new parsing models for XML 
are (in my opinion) more likely to take us to the next level of 
usefulness for XML documents.

Hope you are having a great day!

Patrick

>Basically, an XPathNavigator provides "virtual" random-access XML, and therefore can be implemented to do things like lazily load, use pre-built indexes, and so on.
> 
>Of course, one could always subclass XmlDocument (DOM) and implement lazy loading, but it's way easier with a cursor.
>
>________________________________
>
>From: Patrick Durusau [mailto:pdurusau@emory.edu]
>Sent: Wed 5/28/2003 2:59 AM
>To: xml-dev@lists.xml.org
>Cc: zhengyu
>Subject: Re: [xml-dev] Does SAX make sense?
>
>
>
>Jimmy,
>
>zhengyu wrote:
>
>  
>
>>I have got a weird question in mind that I would like to toss it out.
>>
>>Suppose there is a way to offer DOM type interface with SAX kind of
>>efficiency.
>>
>>    
>>
>Matthew O'Donnnell and I have made a series of presentations on this
>particular issue. Our latest proposal is known as JITTs
>(Just-In-Time-Trees), and you can find presentations/papers at: the
>JITTs homepage, http://www.jitts.org or you can visit our homepage on
>overlapping markup at: http://www.sbl-site2.org/Overlap/.
>
>The basic idea is that markup (and hence trees) are recognized as part
>of processing of a file and has no meaning for a parser until it has
>been told to recognize that particular markup token.
>
>What would be required is to change the order of processing used by most
>(if not all XML parsers) to processing the DTD/Schema first and using
>the resulting tree as the basis for recognition of markup events by SAX.
>(The SAX module then only recognizing markup tokens in the tree.) The
>only problem with that approach that has been suggested to us involves
>directly nested elements, such as <div>blah, blah<div>blah,
>blah</div>blah, blah</div>, but the incidence of such markup is unknown.
>
>The advantage to our approach is that a DomLite tree could be
>constructed that retains the unrecognized markup (unlike a SAX filter)
>and upon retreival of the container (recognized markup), the previously
>unrecognized markup could be processed for presentation to the user.
>Simulated tests of this type of processing indicates substantial gains
>in processing speed over traditional construction of full DOM trees.
>Another advantage is that it operates with standard XML syntax, unlike
>some proposals, such as LMNL, which has its own (non-XML) format.
>
>  
>
>>How long would it take for the new processing model to become really
>>popular?
>>
>>
>>    
>>
>Well, it has not become popular (yet!) but the rise of partial parsing
>XML parsers and the like indicate that the need for something more
>efficient than current processing models for XML. JITTs has been
>criticized because it makes well-formedness a question that is answered
>at the time of processing. Personally, I don't find well-formedness
>apart from recognition at the time of processing by a parser all that
>compelling (or even meaningful). There are substantial advantages to
>meeting the requirements of well-formedness as part of processing.
>
>I think the first successful JITTs parser that can be applied to large
>documents, the usual posts to this list, "I have a 10 MB document and
>need to build a DOM tree...," will force a change in the current "markup
>recognition first, useful document processing later" approach. The whole
>point of markup was to enable the processing of documents, not to create
>artificial limitations to prevent it.
>
>Patrick
>
>  
>
>>Jimmy
>>----- Original Message -----
>>From: "Karl Waclawek" <karl@waclawek.net>
>>To: <xml-dev@lists.xml.org>
>>Sent: Sunday, May 25, 2003 7:00 PM
>>Subject: Re: [xml-dev] Does SAX make sense?
>>
>>
>>
>>
>>    
>>
>>>>There are several implementations, but I don't know of any standard
>>>>interface. I have been thinking that having a standard interface just
>>>>for passing XPath expressions to an event parser would be great. Anyone
>>>>know of a standard being worked, implementations, or interested in
>>>>starting a working group? If so, I'm in.
>>>>    
>>>>
>>>>        
>>>>
>>>I am working on something similar, but much simpler right now.
>>>My XPaths are just straight paths, or in other words,  element types.
>>>
>>>My initial plan was to build a DTD (or other schema) validator
>>>(on top of SAX) which has callback hooks for custom validation
>>>or processing. The callbacks are registered by the application
>>>based on a path - but rather a path based on the schema object
>>>model and not the document object model. Every node in the SOM
>>>corresponds to a separate set of callbacks.
>>>
>>>So far I was not thinking of anything more complex, as I think
>>>this would be quite an effort.
>>>
>>>Karl
>>>
>>>-----------------------------------------------------------------
>>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>>initiative of OASIS <http://www.oasis-open.org>
>>>
>>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>>
>>>To subscribe or unsubscribe from this list use the subscription
>>>manager: <http://lists.xml.org/ob/adm.pl>
>>>
>>>  
>>>
>>>      
>>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>initiative of OASIS <http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://lists.xml.org/ob/adm.pl>
>>
>>
>>
>>    
>>
>
>--
>Patrick Durusau
>Director of Research and Development
>Society of Biblical Literature
>Patrick.Durusau@sbl-site.org
>Co-Editor, ISO 13250, Topic Maps -- Reference Model
>
>
>
>
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>
>
>
>
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>
>
>
>  
>

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Co-Editor, ISO 13250, Topic Maps -- Reference Model

References:
- RE: [xml-dev] Does SAX make sense?
  - From: "Joshua Allen" <joshuaa@microsoft.com>

Prev by Date: Re: [xml-dev] Vocabulary Combination and optional namespaces
Next by Date: Re: [xml-dev] Vocabulary Combination and optional namespaces
Previous by thread: RE: [xml-dev] Does SAX make sense?
Next by thread: Re: XPath Data Model proposal
Index(es):
- Date
- Thread