OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: SAX2 Lexical Handler Suggestions

[ Lists Home | Date Index | Thread Index ]
  • From: Chris Pratt <chris@planetpratt.com>
  • To: XML-DEV <xml-dev@lists.xml.org>
  • Date: Wed, 12 Jul 2000 10:57:05 -0700

I can see that you come from the same camp as most XML programmers I've
encountered, which I would sum up as "XML is for documents, why would you
want it to do anything more".  Our system is a web based authoring system
that allows the XHTML/AML (our own markup extensions) to be parsed once and
held in memory in an executable tree so that when each request comes in for
that page a completely new HTML document can be created from the executable
XHTML template by simply running the chain.  If all the attributes (which we
use as Dynamic macros) are silently replaced by the parser, we have no way
of Dynamically inserting the proper macro value into the generated HTML.  As
far as knowing the actual attribute's raw text I need some way to match
which attributes were entities that were expanded by the processor and which
were entered by the user and having the information about which are default
attributes and which were actually from the XML document is very handy to
save space in the downloaded HTML.  I know that SAX parser absolutely can
handle these changes, since I have a modified version of the AElfred2 parser
that currently supports them.

----- Original Message -----
From: "Vilya Harvey" <vilya.harvey@oxford.qss.co.uk>
To: "XML Development Interest Group" <xml-dev@lists.xml.org>
Sent: Wednesday, July 12, 2000 1:55 AM
Subject: Re: SAX2 Lexical Handler Suggestions

> Chris Pratt wrote:
> > First of all, what is the utility of the startEntity() and endEntity()
> > methods of the LexicalHandler?  The end of the entity has definitely
> > parsed before the call to startEntity since the name of the entity is a
> > parameter.  And since a single entity can't bracket information (like an
> > element does) there is no utility in the endEntity() method, unless I'm
> > missing something obvious.  In this case, I would suggest we rename the
> > startEntity() method to simply entity() and remove the endEntity()
> startEntity() and endEntity() respectively indicate the start and end of
> _replacement_text_ of an entity reference. This replacement text may
> contain characters which cause other handler methods to be invoked; that's
> there needs to be start & end methods for it. This could probably be made
> little clearer in the documentation. I think the confusion mainly springs
> the common misuse of the term 'entity' to mean 'entity reference', while
> uses the term in it's proper sense to mean the block of characters that
> indicated by the reference.
> > Also many systems (mine included) need to be able to tell the difference
> > between an entity in element data and an entity as an attribute value,
> > I'd suggest adding a boolean parameter to the entity() method specifying
> > which of the two possible uses of entities has been found (i.e. public
> > entity (String name,boolean isAttr);).
> Because of the way SAX has been designed to work, entities in attributes
> be reported. They are just resolved silently by the parser.
> Why do you need to know when an entity is being processed? Apologies in
> if this treads on your (or anyone elses) toes, but I generally find that
> "need" to know stems from a misunderstanding of their purpose. In some
> they are analogous to preprocessor macros in C: they get expanded and the
> result is processed as if it was part of the original document. The only
> that needs to know about preprocessor macros is the compiler (for
> debug information); likewise, the only thing that really needs to know
> entity references (generally speaking) is the parser. They are essentially
> shorthand mechanism, although they also allow external documents to be
> included.
> > My final suggestion is a method to detect when attributes are
> > being that there is a great amount of information that can be
> > in an attribute definition.  My current hack of the AElfred parser
defines a
> > SAX2 Extension handler that supports the following call:
> <snip>
> All of the information in your attribute() method, with the exception of
> raw attribute text, is available through the existing interface. As
regards the
> raw attribute text, see my comments above. If you really do want to
> entity reference names (particularly in attributes), I would suggest
> the source code to a parser (you obviously already have AElfred) and
> your own non-SAX API to it, rather than changing SAX to support this.
> Hope that helps,
> Vil.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS