OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   (more) extensible SAX

[ Lists Home | Date Index | Thread Index ]
  • From: Eric van der Vlist <vdv@dyomedea.com>
  • To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
  • Date: Tue, 05 Dec 2000 04:37:06 +0100

Although this email would have been more timely 1 year ago, I'd like to
share some of my thoughts about SAX and one of the ways to make it more
easily extensible.

First, to set up the context, I'd like to say a word about what I think
is the most important difference between SAX and other APIs (like DOM). 

In most of the papers I can read, SAX is opposed to DOM as a pull
versus push. 

While this is certainly an important difference, I don't see it as the
main difference, but I'd rather say that the main difference is that SAX
and DOM are acting at different levels and that SAX is the most
"neutral" interface, DOM being more biased by a specific interpretation
of what is a XML document.

What is making SAX unique is that no (or very few) assumption is made on
the way the information will be used and is presented almost raw to the

While an application using a DOM interface will have to re-interpret
information stored into the DOM and often to translate its structure,
the same application using SAX will only have to create its object
model from raw information.

This is true of "data oriented" application and can even being true of
document oriented applications, XSLT processors being a good example of
applications that can increase there performance by using their specific
object models rather than by using a standard DOM.

Now, I'd like to go on by explaining what I think are the two weaknesses
of SAX.

The first of them is that the information isn't raw enough for some
applications and that there is still an information loss in the
interpretation that is done (an example is the fact that you can't
access information about parsed entities as discussed in one of my
articles [1] on XML.com).

This second (and almost opposite) one is that in some cases, there isn't
enough interpretation. The way SAX1 has needed to be modified to support
the namespaces is a good example for this and the problem is likely to
happen again as long as new features are added through modularization to
XML 1.0.

I think that both are coming from a quest to find a balance and to
define an API that will meet most of the needs (I could call it the "one
fits all" utopia) and that this issue should be addressed by adding more
modularity and layering rather than by adding more complexity to
existing methods.

The way SAX2 is handling namespaces is showing, IMHO, how difficult it
will be to extend its features.

I find the fact that to expose more information about a simple
"startElement" we have needed to change the API to add new parameters to
the methods really worrying.

I think this would be a good justification to hide the complexity of the
XML productions within objects.

What do I mean concretely ?

Instead of:

startElement(java.lang.String namespaceURI,
	java.lang.String localName,
	java.lang.String qName,
	Attributes atts)
	throws SAXException

I would have far preferred to have:

startElement(org.xml.sax.StartElement start)
        throws SAXException

Where the StartElement class would have been extensible by adding new
methods rather than by modifying existing ones and could potentially
have provided all the available information about the tag.

Without such a mechanism, I am afraid that to support feature X or Y
(think of xml:base of xml:lang for instance), you'll need to add more
parameters to the startElement method.

This model would also allow to provide the full text of the opening tag
to the tools that might need it (for instance a XML editor that would
like to preserve its format).

It would help solving the issue of scoped nodes that I have recently
posted on xml-dev [2].

Last point, why do I call it a layered interface ?

Because we could define on top of this a layered architecture where a
single event would get richer by each layer it comes through.

The first layer could be the recognition of the basics XML productions.

A second layer could be to include entities processing and well formness

Next layers would include namespaces and scoped attributes.

The same object (startElement for instance) could go through the
different layers and gain peace of interpretation and information
without losing it's original info just by being used to create a object
from an extended class at each step.

I had a look at Aelfred and XP and both are more or less implementing
this kind of layering, even though it's not that clearly separated and
it's using internal proprietary interfaces.

I don't see anything but advantages, one of them being the extensiblity:
with this architecture, SAX2 would just have been a layer on top of

Have I miss something ?



[1] http://www.xml.com/pub/a/2000/08/09/xslt/xslt.html
[2] http://lists.xml.org/archives/xml-dev/200011/msg00551.html
See you at XML 2000
Eric van der Vlist       Dyomedea                    http://dyomedea.com
http://xmlfr.org         http://4xt.org              http://ducotede.com


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS