Lists Home |
Date Index |
- From: David Megginson <firstname.lastname@example.org>
- To: Tim Bray <email@example.com>
- Date: Sat, 13 Dec 1997 21:01:04 -0500
Tim Bray writes:
> > attribute(XmlParser, String, String, boolean)
> It seems completely wrong to have an attribute event separate from
> start-element events.
I have worried about this myself. My design goal with Ælfred has been
to limit myself to two class files: one for the parser itself, and one
for the interface for the callbacks -- hence the separate event for
attributes. This decision has forced some pretty severely hacked-up
internal code accompanied by very careful documentation.
I could send a hashtable of attribute names and values with the
startElement() callback, and let users look up types (etc.) with my
query methods, but I would have to lose a bit on two counts:
1) Allocating a new hashtable for every start tag will slow down the
parser a fair bit.
2) I'd have no way to show which attributes were specified and which
were defaulted (see below).
> What's the boolean? I don't think the application author should
> to have to deal with anything but the name and value of attributes.
The boolean tells whether the attribute was specified or defaulted. I
include this to allow people to do useful XML-to-XML transformations.
> > data(XmlParser, String)
> I feel that the 2nd argument should not be a String. It is a recipe
> for disastrous inefficiency if the processor has to cook up a
> java.lang.String object for every little chunk of text.
The overhead isn't that bad with Ælfred because I coalesce my data
into the largest chunks possible before allocating the String. I
think that returning a char array would be confusing for users, and
would lead to many bugs in their code as they ignored our warnings not
to rely on the value in the char array outlasting the callback.
> Lark uses two
> arguments, a char array and a character count; the app can
> make a String if it needs to. If you find this awkward, create
> a new data type called Text so that if you need a String you
> can make it with lazy-evaluation in Text.toString(), but if you
> don't need it you don't build it.
Again, I'm reluctant to create new classes beyond XmlParser and
> Also, it shouldn't be named "data" - it should be named
> characterData or charData or text or some such term that can
> be mapped directly to the spec.
Agreed. I will not change Ælfred now, but I think that this is a good
> > resolveEntity(XmlParser, String, String, URL)
> I don't think entities have any place in the first cut of this
> interface. The processor exists to make these problems go away.
Normally, you should just return the URL argument; however, this
callback gives users a chance to do public-identifier resolution, URL
substitution, etc., and to return a different URL if desired. For
example, if we had a DTD at
and you had a local copy, you could substitute a local URL on your own
computer. Likewise, you could do a catalogue lookup on the public
identifier "-//microstar//DTD Microstar Sample Document//EN" and
choose a different system identifier than the default supplied in the
That said, I agree that this probably doesn't belong in the common
> Lark has a thing where if any callback returns 'true', the
> parser drops out of its loop... which is awfully useful and easy
> I think. Lark will also re-enter, but this need not be a requirement.
Awfully easy with a DFA-driven parser, but trickier with a
recursive-descent parser like Ælfred. I'd probably have to throw an
exception, and could not allow any kind of re-entry.
> Also, for application programmers, especially dealing with smallish
> objects, a tree interface is very natural. I've written both
> event-stream and tree apps using Lark, and the trees are a lot
> easier to use for anything even moderately complex. So the API
> should have Element, Attribute, and Text classes.
Perhaps -- I may have to give in an allow Ælfred to use more than one
class file; or alternatively, these would be an optional extra, along
with the SAX-J layer.
> And it shouldn't (sorry Peter) be called YAXPAPI - how about SAX, Simple
> API for XML? Maybe SAX-J for the Java bindings. -Tim
How about RUSTY?
All the best,
David Megginson firstname.lastname@example.org
Microstar Software Ltd. email@example.com
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)