Lists Home |
Date Index |
- From: David Brownell <firstname.lastname@example.org>
- To: Eric van der Vlist <email@example.com>, firstname.lastname@example.org
- Date: Tue, 05 Dec 2000 17:11:13 +0000
Summary: I don't see a problem here. No federal issue, as it were;
layering works fine already.
> In most of the papers I can read, SAX is opposed to DOM as a pull
> versus push.
> While this is certainly an important difference, I don't see it as the
> main difference, but I'd rather say that the main difference is that SAX
> and DOM are acting at different levels and that SAX is the most
> "neutral" interface, DOM being more biased by a specific interpretation
> of what is a XML document.
I see the functional difference as being that SAX is a callback
API, while DOM is basically a data structure -- and often one
that's not particularly task-appropriate. There are also some
differences in the data/infoset exposed, and very significant
ones in portability. (DOM still has no portable bootstrap API.)
> Now, I'd like to go on by explaining what I think are the two weaknesses
> of SAX.
> The first of them is that the information isn't raw enough for some
> applications and that there is still an information loss in the
> interpretation that is done ...
Having looked at that issue in excruciating detail, I think it's
typically fair to say that "some applications" want an API that
presents lexical processing data. SAX is a parser API, that's not
what it was designed to address -- but a SAX2 extension could let a
parser expose lexical data, if it wanted to go there.
> This second (and almost opposite) one is that in some cases, there isn't
> enough interpretation. The way SAX1 has needed to be modified to support
> the namespaces is a good example for this and the problem is likely to
> happen again as long as new features are added through modularization to
> XML 1.0.
Actually, SAX1 did not _need_ to be modified that way. There were
examples of doing such processing in layers above SAX1, even before
the one that got bundled into SAX2. That was a design choice, not
a structural imperative.
> I think that both are coming from a quest to find a balance and to
> define an API that will meet most of the needs (I could call it the "one
> fits all" utopia) and that this issue should be addressed by adding more
> modularity and layering rather than by adding more complexity to
> existing methods.
I agree about layering and modularity, but can't quite see why there
would be any problem achieving either of those with the current SAX.
Perhaps you're really wanting to see new layers get standardized? :-)
> Last point, why do I call it a layered interface ?
> Because we could define on top of this a layered architecture where a
> single event would get richer by each layer it comes through.
> The first layer could be the recognition of the basics XML productions.
Which productions -- the lexical ones, or the grammatical ones? I count
two layers there. (Evidently from its SGML heritage, XML doesn't have
the cleanest of distinctions between those layers, but it exists.) The
SAX API is basically a grammatical layer.
> A second layer could be to include entities processing and well formness
Actually some of the XML rules require WF checks at a lexical level,
while some are purely grammatical or content-based. Entities are
basically processed in the boundary between lexical and syntactical
processing -- "&foo;" or "%bar;" need lexical exposure, but basically
they're invisible otherwise. (Yes, I'm partitioning the infoset into
classic categories there.)
> Next layers would include namespaces and scoped attributes.
Hmm, you omitted validation. Though it's known that validation can
basically be done as a layer over SAX2 ... and that any such layers
don't actually need to be "SAX (tm)" branded.
> I don't see anything but advantages, one of them being the extensiblity:
> with this architecture, SAX2 would just have been a layer on top of
> Have I miss something ?
Well, there are already SAX2 wrappers of SAX1 parsers that work
exactly that way -- except for "optional" features.