xml-dev - Re: SAX2 Lexical Handler Suggestions

Re: SAX2 Lexical Handler Suggestions

[ Lists Home | Date Index | Thread Index ]

From: Chris Pratt <chris@planetpratt.com>
To: XML-DEV <xml-dev@lists.xml.org>
Date: Wed, 12 Jul 2000 10:57:05 -0700

I can see that you come from the same camp as most XML programmers I've
encountered, which I would sum up as "XML is for documents, why would you
want it to do anything more".  Our system is a web based authoring system
that allows the XHTML/AML (our own markup extensions) to be parsed once and
held in memory in an executable tree so that when each request comes in for
that page a completely new HTML document can be created from the executable
XHTML template by simply running the chain.  If all the attributes (which we
use as Dynamic macros) are silently replaced by the parser, we have no way
of Dynamically inserting the proper macro value into the generated HTML.  As
far as knowing the actual attribute's raw text I need some way to match
which attributes were entities that were expanded by the processor and which
were entered by the user and having the information about which are default
attributes and which were actually from the XML document is very handy to
save space in the downloaded HTML.  I know that SAX parser absolutely can
handle these changes, since I have a modified version of the AElfred2 parser
that currently supports them.
    (*Chris*)

----- Original Message -----
From: "Vilya Harvey" <vilya.harvey@oxford.qss.co.uk>
To: "XML Development Interest Group" <xml-dev@lists.xml.org>
Sent: Wednesday, July 12, 2000 1:55 AM
Subject: Re: SAX2 Lexical Handler Suggestions


> Chris Pratt wrote:
> > First of all, what is the utility of the startEntity() and endEntity()
> > methods of the LexicalHandler?  The end of the entity has definitely
been
> > parsed before the call to startEntity since the name of the entity is a
> > parameter.  And since a single entity can't bracket information (like an
> > element does) there is no utility in the endEntity() method, unless I'm
> > missing something obvious.  In this case, I would suggest we rename the
> > startEntity() method to simply entity() and remove the endEntity()
method.
>
> startEntity() and endEntity() respectively indicate the start and end of
the
> _replacement_text_ of an entity reference. This replacement text may
itself
> contain characters which cause other handler methods to be invoked; that's
why
> there needs to be start & end methods for it. This could probably be made
a
> little clearer in the documentation. I think the confusion mainly springs
from
> the common misuse of the term 'entity' to mean 'entity reference', while
SAX
> uses the term in it's proper sense to mean the block of characters that
are
> indicated by the reference.
>
> > Also many systems (mine included) need to be able to tell the difference
> > between an entity in element data and an entity as an attribute value,
so
> > I'd suggest adding a boolean parameter to the entity() method specifying
> > which of the two possible uses of entities has been found (i.e. public
void
> > entity (String name,boolean isAttr);).
>
> Because of the way SAX has been designed to work, entities in attributes
cannot
> be reported. They are just resolved silently by the parser.
>
> Why do you need to know when an entity is being processed? Apologies in
advance
> if this treads on your (or anyone elses) toes, but I generally find that
this
> "need" to know stems from a misunderstanding of their purpose. In some
ways
> they are analogous to preprocessor macros in C: they get expanded and the
> result is processed as if it was part of the original document. The only
thing
> that needs to know about preprocessor macros is the compiler (for
generating
> debug information); likewise, the only thing that really needs to know
about
> entity references (generally speaking) is the parser. They are essentially
a
> shorthand mechanism, although they also allow external documents to be
> included.
>
> > My final suggestion is a method to detect when attributes are
encountered
> > being that there is a great amount of information that can be
disseminated
> > in an attribute definition.  My current hack of the AElfred parser
defines a
> > SAX2 Extension handler that supports the following call:
> <snip>
>
> All of the information in your attribute() method, with the exception of
the
> raw attribute text, is available through the existing interface. As
regards the
> raw attribute text, see my comments above. If you really do want to
process
> entity reference names (particularly in attributes), I would suggest
grabbing
> the source code to a parser (you obviously already have AElfred) and
adding
> your own non-SAX API to it, rather than changing SAX to support this.
>
> Hope that helps,
> Vil.
>

Follow-Ups:
- Re: SAX2 Lexical Handler Suggestions
  - From: David Brownell <david-b@pacbell.net>
- Re: SAX2 Lexical Handler Suggestions
  - From: David Megginson <david@megginson.com>

References:
- SAX2 Lexical Handler Suggestions
  - From: Chris Pratt <chris@planetpratt.com>
- Re: SAX2 Lexical Handler Suggestions
  - From: Vilya Harvey <vilya.harvey@oxford.qss.co.uk>

Prev by Date: Re: Mechanics of mapping relational data into XML
Next by Date: Re: SAX2 Lexical Handler Suggestions
Previous by thread: Re: SAX2 Lexical Handler Suggestions
Next by thread: Re: SAX2 Lexical Handler Suggestions
Index(es):
- Date
- Thread