[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: SAX2 ... missing features?
- From: David Brownell <email@example.com>
- To: Rob Lugt <firstname.lastname@example.org>, email@example.com
- Date: Fri, 13 Jul 2001 11:57:17 -0700
Thanks for this list, Rob ...
> > SAX2 has been out for a while, and I'm curious what folks'
> > pet peeves are. Please share! I'd hope that some of these
> > could turn into (backwards compatible) updates.
> David, here is my list, again in no particular order
> -New Features:
> * concatenate-characters
> Setting this feature would cause ContentHandlers to concatenate contiguous
> characters (including resolved entities resulting in character data) into a
> single call to characters(). This must be a very common requirement, one
> that applications have to manage themselves at present (and some probably
> are unaware of the need). This could also be achieved with a filter - see
Yep, I'd prefer to see some sort of filter component for this
type of task (and this one in particular).
It's possible to use a filter in cases like a DOM parser
(walking a DOM Document and generating SAX2 events),
or cases like apps turning their data into a stream of such
events. Building it into parsers (feature flag) would not
help SAX code that's not parser-centric.
> * preserve-systemIds
> Setting this feature would prevent the SAX Driver from making system
> identifiers absolute before calling the EntityResolver. This has been
> previously suggested and is registered in SourceForge #434478
Ah -- where in sourceforge? Not the SAX project.
That's probably a reasonable way to address that feature, though
I still think the XML and infoset specs need updating to make clear
it's permitted. (It's possible this is yet another case of assuming
that the spec says what SGML does, despite the language.)
> -New Filter Classes
> I wonder why the XMLFilter lumps all the core interfaces together into one
> filter. Would it not make more sense to follow the typical Java i/o model
> and create Filter classes for each type of Handler?
Actually it bothers me that it omits the Decl and Lexical handlers, and
that it merges in XMLReader and EntityResolver -- which aren't pure
I actually prefer the notion of having a pipeline filter stage group all the
infoset properties together. As an architectural abstraction, that's less
error prone and (closely related) is easier to conceptualize/explain.
> This would allow
> applications to have finer-grained control, and it would offer the same
> facilities for SAX extensions. For example, I could foresee a use for a
> CharacterConcatenationFilter which implements ContentHandlerFilter,
> concatenating contiguous characters as described above.
But that can be done as it stands today, with no changes. (Except that
it can't preserve CDATA section boundaries ...)
> -Provide streaming interfaces for comments and PIs
> Currently the SAX Driver is forced to buffer the comment/PI text which can
> be arbitrarily large.
> -Provide raw content model and internal entity values
> The DeclHandler provides useful information to create DTD documentation.
> Additional pieces of information that is missing however are the raw values
> (where %PE;s are unexpanded) of content model and internal entity values.
> This is used to good effect in other DTD documentation utilities such as
> Normal Walsh's DTDParse .
I think the "xml-string" property (not widely implemented) permits this
already. There's a corresponding problem for start tags: entity refs
within attribute values are only visible in expanded form.
> -Namespace mapping
> The current design of ContentHandler (startPrefixMapping/endPrefixMapping)
> requires that the application maintain a stack of scoped namespace contexts
> (probably by using NamespaceSupport). Each call to startPrefixMapping must
> push a prefix/uri pair onto the stack, and calls to endPrefixMapping should
> pop the stack. It would be easier for applications if there was an
> additional method called changePrefixMapping which provided the latest
> prefix/uri mapping. In this way the application could simply maintain a map
> rather than a stack. Probably too late to do anything about this though ;-(
Maybe a little easier, but that's likely stuff that the namespace helper class
should be able to handle already. Not _that_ much easier ... :)
> Rob Lugt
> ElCel Technology
>  http://www.nwalsh.com/perl/dtdparse/index.html