OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: (more) extensible SAX

[ Lists Home | Date Index | Thread Index ]
  • From: Ken MacLeod <ken@bitsko.slc.ut.us>
  • To: xml-dev@lists.xml.org
  • Date: Tue, 05 Dec 2000 10:58:59 -0600

Eric van der Vlist <vdv@dyomedea.com> writes:

> First, to set up the context, I'd like to say a word about what I
> think is the most important difference between SAX and other APIs
> (like DOM).
> In most of the papers I can read, SAX is opposed to DOM as a pull
> versus push.
> While this is certainly an important difference, I don't see it as
> the main difference, but I'd rather say that the main difference is
> that SAX and DOM are acting at different levels and that SAX is the
> most "neutral" interface, DOM being more biased by a specific
> interpretation of what is a XML document.

I'll start off by saying I disagree a little with your premise, but
agree with your conclusion :-)

SAX and DOM both present the XML InfoSet, one in parse sequence (as a
stream) and one after parsing is complete (as a tree, and some
libraries offer a mix of the two).

> Now, I'd like to go on by explaining what I think are the two
> weaknesses of SAX.
> The first of them is that the information isn't raw enough for some
> applications and that there is still an information loss in the
> interpretation that is done (an example is the fact that you can't
> access information about parsed entities as discussed in one of my
> articles [1] on XML.com).

Translating, what you're saying here is the the XML InfoSet only shows
the logical model of XML, and not the physical.  Some people want the
original information, or a "more complete XML InfoSet also".  (This
particular topic has been discussed in these terms before, often in
contrast to an ``even simpler XML InfoSet.''")

> [snip]

> I think that both are coming from a quest to find a balance and to
> define an API that will meet most of the needs (I could call it the
> "one fits all" utopia) and that this issue should be addressed by
> adding more modularity and layering rather than by adding more
> complexity to existing methods.
> The way SAX2 is handling namespaces is showing, IMHO, how difficult
> it will be to extend its features.
> I find the fact that to expose more information about a simple
> "startElement" we have needed to change the API to add new
> parameters to the methods really worrying.
> I think this would be a good justification to hide the complexity of
> the XML productions within objects.
> What do I mean concretely ?
> Instead of:
> startElement(java.lang.String namespaceURI,
> 	java.lang.String localName,
> 	java.lang.String qName,
> 	Attributes atts)
> 	throws SAXException
> I would have far preferred to have:
> startElement(org.xml.sax.StartElement start)
>         throws SAXException
> Where the StartElement class would have been extensible by adding
> new methods rather than by modifying existing ones and could
> potentially have provided all the available information about the
> tag.
> Without such a mechanism, I am afraid that to support feature X or Y
> (think of xml:base of xml:lang for instance), you'll need to add
> more parameters to the startElement method.
> This model would also allow to provide the full text of the opening
> tag to the tools that might need it (for instance a XML editor that
> would like to preserve its format).
> It would help solving the issue of scoped nodes that I have recently
> posted on xml-dev [2].

Coincidentally, this is one of the major feature differences with Perl
SAX[a] (being descussed in the "SAX Comments" thread[b]).  Perl SAX has
always used a node as the argument to SAX events.  In our case,
though, it's not a "start element" or "end element" object, but just
an element node, a DOM node, to be exact.

For SAX2's namespaces, all that needed adding was Prefix, LocalName,
and NamespaceURI properties to the nodes.

For "raw" parsing, one need only add "raw" information properties,
like "OriginalStartTag", "OriginalEndTag".  (No one's done that yet,
but it's on the wish list.)

Namespace support was such a "simple" change to Perl SAX, that we now
face the dilemma of changing the class names just because the SAX2
Java implementation had to change interfaces to support these new

"Marrying SAX and DOM" was discussed briefly here on xml-dev[c] and on
Python's XML-SIG[d,e].

> I don't see anything but advantages, one of them being the
> extensiblity: with this architecture, SAX2 would just have been a
> layer on top of SAX1.
> Have I miss something ?

Not that I can see, in Perl this has been working splendidly for about
18 months.

 -- Ken

> [1] http://www.xml.com/pub/a/2000/08/09/xslt/xslt.html
> [2] http://lists.xml.org/archives/xml-dev/200011/msg00551.html

[a] <http://bitsko.slc.ut.us/~ken/perl-xml/sax-2.0.html>
[b] <http://lists.xml.org/archives/xml-dev/200012/msg00047.html>
[c] <http://lists.xml.org/archives/xml-dev/200003/msg00316.html>
[d] <http://mail.python.org/pipermail/xml-sig/2000-February/001905.html>
[e] <http://mail.python.org/pipermail/xml-sig/2000-February/001907.html>
    follow the thread on "EasySAX"


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS