xml-dev - RE: [xml-dev] Does SAX make sense?

RE: [xml-dev] Does SAX make sense?

[ Lists Home | Date Index | Thread Index ]

To: <Patrick.Durusau@sbl-site.org>,<xml-dev@lists.xml.org>
Subject: RE: [xml-dev] Does SAX make sense?
From: "Joshua Allen" <joshuaa@microsoft.com>
Date: Wed, 28 May 2003 12:44:20 -0700
Cc: "zhengyu" <zhengyu@attbi.com>
Thread-index: AcMk/8TQqTwIjdgvRyiNBcc5bDNu3AAUKcRO
Thread-topic: [xml-dev] Does SAX make sense?

FWIW, this is exactly the sort of scenario we hoped to enable with the design of XPathNavigator in V1 of the .NET Frameworks (and I believe BEA has a similar thing called XmlCursor).

Basically, an XPathNavigator provides "virtual" random-access XML, and therefore can be implemented to do things like lazily load, use pre-built indexes, and so on.

Of course, one could always subclass XmlDocument (DOM) and implement lazy loading, but it's way easier with a cursor.

________________________________

From: Patrick Durusau [mailto:pdurusau@emory.edu]
Sent: Wed 5/28/2003 2:59 AM
To: xml-dev@lists.xml.org
Cc: zhengyu
Subject: Re: [xml-dev] Does SAX make sense?

Jimmy,

zhengyu wrote:

>I have got a weird question in mind that I would like to toss it out.
>
>Suppose there is a way to offer DOM type interface with SAX kind of
>efficiency.
>
Matthew O'Donnnell and I have made a series of presentations on this
particular issue. Our latest proposal is known as JITTs
(Just-In-Time-Trees), and you can find presentations/papers at: the
JITTs homepage, http://www.jitts.org or you can visit our homepage on
overlapping markup at: http://www.sbl-site2.org/Overlap/.

The basic idea is that markup (and hence trees) are recognized as part
of processing of a file and has no meaning for a parser until it has
been told to recognize that particular markup token.

What would be required is to change the order of processing used by most
(if not all XML parsers) to processing the DTD/Schema first and using
the resulting tree as the basis for recognition of markup events by SAX.
(The SAX module then only recognizing markup tokens in the tree.) The
only problem with that approach that has been suggested to us involves
directly nested elements, such as <div>blah, blah<div>blah,
blah</div>blah, blah</div>, but the incidence of such markup is unknown.

The advantage to our approach is that a DomLite tree could be
constructed that retains the unrecognized markup (unlike a SAX filter)
and upon retreival of the container (recognized markup), the previously
unrecognized markup could be processed for presentation to the user.
Simulated tests of this type of processing indicates substantial gains
in processing speed over traditional construction of full DOM trees.
Another advantage is that it operates with standard XML syntax, unlike
some proposals, such as LMNL, which has its own (non-XML) format.

>How long would it take for the new processing model to become really
>popular?
> 
>
Well, it has not become popular (yet!) but the rise of partial parsing
XML parsers and the like indicate that the need for something more
efficient than current processing models for XML. JITTs has been
criticized because it makes well-formedness a question that is answered
at the time of processing. Personally, I don't find well-formedness
apart from recognition at the time of processing by a parser all that
compelling (or even meaningful). There are substantial advantages to
meeting the requirements of well-formedness as part of processing.

I think the first successful JITTs parser that can be applied to large
documents, the usual posts to this list, "I have a 10 MB document and
need to build a DOM tree...," will force a change in the current "markup
recognition first, useful document processing later" approach. The whole
point of markup was to enable the processing of documents, not to create
artificial limitations to prevent it.

Patrick

>Jimmy
>----- Original Message -----
>From: "Karl Waclawek" <karl@waclawek.net>
>To: <xml-dev@lists.xml.org>
>Sent: Sunday, May 25, 2003 7:00 PM
>Subject: Re: [xml-dev] Does SAX make sense?
>
>
> 
>
>>>There are several implementations, but I don't know of any standard
>>>interface. I have been thinking that having a standard interface just
>>>for passing XPath expressions to an event parser would be great. Anyone
>>>know of a standard being worked, implementations, or interested in
>>>starting a working group? If so, I'm in.
>>>     
>>>
>>I am working on something similar, but much simpler right now.
>>My XPaths are just straight paths, or in other words,  element types.
>>
>>My initial plan was to build a DTD (or other schema) validator
>>(on top of SAX) which has callback hooks for custom validation
>>or processing. The callbacks are registered by the application
>>based on a path - but rather a path based on the schema object
>>model and not the document object model. Every node in the SOM
>>corresponds to a separate set of callbacks.
>>
>>So far I was not thinking of anything more complex, as I think
>>this would be quite an effort.
>>
>>Karl
>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>initiative of OASIS <http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://lists.xml.org/ob/adm.pl>
>>
>>   
>>
>
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>
>
> 
>

--
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Co-Editor, ISO 13250, Topic Maps -- Reference Model

-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>

Follow-Ups:
- Re: [xml-dev] Does SAX make sense?
  - From: Patrick Durusau <pdurusau@emory.edu>

Prev by Date: RE: [xml-dev] XML Schema and XPointer
Next by Date: Re: [xml-dev] Why Standards?
Previous by thread: Re: [xml-dev] Does SAX make sense?
Next by thread: Re: [xml-dev] Does SAX make sense?
Index(es):
- Date
- Thread