OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

filtering noise (was Re: SAX LexicalHandler::comment issue)

On 05 Jul 2001 13:24:56 -0700, David Brownell wrote: 
> > Writing a SAX filter is driving me to once again question the wisdom of
> > the existence of attributes, but I can't say I'd seriously propose that
> > they be discarded by default.
> To me, it comes down to not wanting to be stuck with the
> syntactic sugar DOM insists on.  I don't see attributes as
> being in that category, since they hold real data.  I'd rather
> just not spend the memory.

That doesn't strike me as a problem of the DOM - it strikes me as a
processing problem that hasn't been well-solved.

The DOM (and Infoset, IMHO) need to be able to represent everything XML
1.0 offers.  People who need less should be able to turn those things
off.  Rather than battling algorithms for normalizing DOM node sets, it
seems a lot easier to filter out (if you want) comments, PIs, CDATA
sections, and ignorable whitespace in a SAX context, long before the
nodes are created.

Unfortunately, no one seemed to like the
controlled-streaming-into-a-tree model at the time these things started,
and now we've just got pileups.