OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: SAX2 Lexical Handler Suggestions

[ Lists Home | Date Index | Thread Index ]
  • From: Vilya Harvey <vilya.harvey@oxford.qss.co.uk>
  • To: XML Development Interest Group <xml-dev@lists.xml.org>
  • Date: Wed, 12 Jul 2000 09:55:46 +0100

Chris Pratt wrote:
> First of all, what is the utility of the startEntity() and endEntity()
> methods of the LexicalHandler?  The end of the entity has definitely been
> parsed before the call to startEntity since the name of the entity is a
> parameter.  And since a single entity can't bracket information (like an
> element does) there is no utility in the endEntity() method, unless I'm
> missing something obvious.  In this case, I would suggest we rename the
> startEntity() method to simply entity() and remove the endEntity() method.

startEntity() and endEntity() respectively indicate the start and end of the
_replacement_text_ of an entity reference. This replacement text may itself
contain characters which cause other handler methods to be invoked; that's why
there needs to be start & end methods for it. This could probably be made a
little clearer in the documentation. I think the confusion mainly springs from
the common misuse of the term 'entity' to mean 'entity reference', while SAX
uses the term in it's proper sense to mean the block of characters that are
indicated by the reference.

> Also many systems (mine included) need to be able to tell the difference
> between an entity in element data and an entity as an attribute value, so
> I'd suggest adding a boolean parameter to the entity() method specifying
> which of the two possible uses of entities has been found (i.e. public void
> entity (String name,boolean isAttr);).

Because of the way SAX has been designed to work, entities in attributes cannot
be reported. They are just resolved silently by the parser.

Why do you need to know when an entity is being processed? Apologies in advance
if this treads on your (or anyone elses) toes, but I generally find that this
"need" to know stems from a misunderstanding of their purpose. In some ways
they are analogous to preprocessor macros in C: they get expanded and the
result is processed as if it was part of the original document. The only thing
that needs to know about preprocessor macros is the compiler (for generating
debug information); likewise, the only thing that really needs to know about
entity references (generally speaking) is the parser. They are essentially a
shorthand mechanism, although they also allow external documents to be

> My final suggestion is a method to detect when attributes are encountered
> being that there is a great amount of information that can be disseminated
> in an attribute definition.  My current hack of the AElfred parser defines a
> SAX2 Extension handler that supports the following call:

All of the information in your attribute() method, with the exception of the
raw attribute text, is available through the existing interface. As regards the
raw attribute text, see my comments above. If you really do want to process
entity reference names (particularly in attributes), I would suggest grabbing
the source code to a parser (you obviously already have AElfred) and adding
your own non-SAX API to it, rather than changing SAX to support this.

Hope that helps,


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS