xml-dev - General SAX2 Observations

General SAX2 Observations

[ Lists Home | Date Index | Thread Index ]

From: "Box, Don" <dbox@develop.com>
To: "'xml-dev@xml.org'" <xml-dev@XML.ORG>, "David Megginson (E-mail)" <david@megginson.com>
Date: Sun, 12 Mar 2000 15:30:02 -0800

Title: General SAX2 Observations

Sorry to chime in with these observations so late in the game, but my understanding of how SAX2 fits into the overall XML puzzle is evolving fairly rapidly. BTW, these observations are just that, observations. In general, I am a very happy SAX2b2 user.

0) I think that the independence of ContentHandler, LexicalHandler, DTDHandler and ErrorHandler are carried a bit too far. In general, I would like to be able to ask the content handler for the lexical handler interface (a la Java's cast operator or COM's QueryInterface). To allow two distinct objects to be used, I would bake in the getter method instead of relying on the underlying type system.

1) The feature mechanism is busted wrt the two namespace-related features. It assumes that the ContentHandler interface will only be called by an XMLReader implementation. This makes it tough for folks who want to use ContentHandler (and friends) as a generic interface for modeling the Infoset as a stream of method invocations (see David Brownell's stuff for an example of one such architecture), since one doesn't know generically whether or not to deliver namespace declarations as attributes or not. In general, the "consumer" of ContentHandler should be able to interrogate the namespace management policy of the receiver of the SAX events. Ideally, the ContentHandler would have the following method:

public abstract int getNamespaceSupport();

that would return one of the following three values:

    static final int NAMESPACES_ONLY = 0;
    static final int RAW_NAMES_ONLY = 1;
    static final int RAW_NAMES_PLUS_NAMESPACES = 2;

BTW, this is not to bag on the feature mechanism in general. Rather, because the protocol of ContentHandler changes so dramatically based on the two namespace-related features, I think their discovery needs to be more tightly bound to the actual receiver of the namespace-related events. Given the method shown here, a given XMLReader implementation could simply query the ContentHandler once prior to delivering the startDocument event and configure itself accordingly.

2) NamespaceSupport needs to broken into an interface/implementation pair (a la Attributes/AttributesImpl). Ideally, I would like to be able to swap in different implementations of NamespaceSupport for performance reasons. Additionally, environments like C++ and COM don't lend themselves to having classes shared across DLL boundaries, so using an interface would make SAX2 consistent across Java, C++ and COM. One could make a similar argument about InputSource, and arguably it is more important to factor InputSource since it appears as a method parameter on a core interface. That stated, I would refactor both for consistency.

3) Minor nit. Wouldn't it be more convenient if the "rawName" parameter/property were replaced with "prefix", especially since doing so would create a nice correlation with the namespace declaration events. Since the Name production of XML 1.0 doesn't allow a Name to begin with a colon, there is no loss of information, and it is simpler to catenate the two strings together than it is to parse for the colon.

4) I am really interested in the C++ and COM mappings of SAX2. I have already made a pass at the latter and am concerned that the former not try to reinvent COM if the goal is to allow C++ code to work across DLL/so boundaries. I see a lot of opportunities for synergy between the C++ and COM mappings and would hate to see people craft yet another C++ component model without leveraging the ideas of things like John Lakos's work, COM and Mozilla's XPCOM. There are some extremely simple guidelines that I hope will be followed, including the following:

a) Hoist all shared types into pure abstract base classes or flat structures.
b) Never allow class-based references or instances to be shared across component boundaries. This includes std::string!
c) Understand that RTTI doesn't work across component boundaries.
d) Understand that exception handling doesn't work across component boundaries.
e) Understand the tradeoffs of supporting both char and wchar_t.
f) Understand that malloc/free new/delete don't work across component boundaries.

All of these observations can be ignored if you DON'T care about dynamic linking/loading in C++. However, if you do care about dynamic linking/loading, there is a family of idioms (many of which are described in John Lakos's "Large Scale C++" book, others of which are described in chapter 1 of my book, "Essential COM") that should be adhered to.

DB
http://www.develop.com/dbox

Follow-Ups:
- SAX2: Namespace support
  - From: David Megginson <david@megginson.com>
- SAX2: Handler Interfaces
  - From: David Megginson <david@megginson.com>
- SAX2: Querying the client
  - From: David Megginson <david@megginson.com>
- Re: General SAX2 Observations
  - From: David Brownell <david-b@pacbell.net>
- Re: General SAX2 Observations
  - From: "Jon Smirl" <jonsmirl@mediaone.net>

Prev by Date: Re: Clarification (was Re: Gutenberg Project <longish>)
Next by Date: Another SAX2 Observation
Previous by thread: More on parameter entity magic for prefixing
Next by thread: Re: General SAX2 Observations
Index(es):
- Date
- Thread