[
Lists Home |
Date Index |
Thread Index
]
- From: David Megginson <ak117@freenet.carleton.ca>
- To: Tim Bray <tbray@textuality.com>
- Date: Sat, 13 Dec 1997 21:01:04 -0500
Tim Bray writes:
> > attribute(XmlParser, String, String, boolean)
>
> It seems completely wrong to have an attribute event separate from
> start-element events.
I have worried about this myself. My design goal with Ælfred has been
to limit myself to two class files: one for the parser itself, and one
for the interface for the callbacks -- hence the separate event for
attributes. This decision has forced some pretty severely hacked-up
internal code accompanied by very careful documentation.
I could send a hashtable of attribute names and values with the
startElement() callback, and let users look up types (etc.) with my
query methods, but I would have to lose a bit on two counts:
1) Allocating a new hashtable for every start tag will slow down the
parser a fair bit.
2) I'd have no way to show which attributes were specified and which
were defaulted (see below).
> What's the boolean? I don't think the application author should
> to have to deal with anything but the name and value of attributes.
The boolean tells whether the attribute was specified or defaulted. I
include this to allow people to do useful XML-to-XML transformations.
> > data(XmlParser, String)
>
> I feel that the 2nd argument should not be a String. It is a recipe
> for disastrous inefficiency if the processor has to cook up a
> java.lang.String object for every little chunk of text.
The overhead isn't that bad with Ælfred because I coalesce my data
into the largest chunks possible before allocating the String. I
think that returning a char[] array would be confusing for users, and
would lead to many bugs in their code as they ignored our warnings not
to rely on the value in the char[] array outlasting the callback.
> Lark uses two
> arguments, a char[] array and a character count; the app can
> make a String if it needs to. If you find this awkward, create
> a new data type called Text so that if you need a String you
> can make it with lazy-evaluation in Text.toString(), but if you
> don't need it you don't build it.
Again, I'm reluctant to create new classes beyond XmlParser and
XmlProcessor.
> Also, it shouldn't be named "data" - it should be named
> characterData or charData or text or some such term that can
> be mapped directly to the spec.
Agreed. I will not change Ælfred now, but I think that this is a good
idea.
> > resolveEntity(XmlParser, String, String, URL)
>
> I don't think entities have any place in the first cut of this
> interface. The processor exists to make these problems go away.
Normally, you should just return the URL argument; however, this
callback gives users a chance to do public-identifier resolution, URL
substitution, etc., and to return a different URL if desired. For
example, if we had a DTD at
http://www.microstar.com/XML/msldoc.dtd
and you had a local copy, you could substitute a local URL on your own
computer. Likewise, you could do a catalogue lookup on the public
identifier "-//microstar//DTD Microstar Sample Document//EN" and
choose a different system identifier than the default supplied in the
document.
That said, I agree that this probably doesn't belong in the common
event API.
> Generalities:
> Lark has a thing where if any callback returns 'true', the
> parser drops out of its loop... which is awfully useful and easy
> I think. Lark will also re-enter, but this need not be a requirement.
Awfully easy with a DFA-driven parser, but trickier with a
recursive-descent parser like Ælfred. I'd probably have to throw an
exception, and could not allow any kind of re-entry.
> Also, for application programmers, especially dealing with smallish
> objects, a tree interface is very natural. I've written both
> event-stream and tree apps using Lark, and the trees are a lot
> easier to use for anything even moderately complex. So the API
> should have Element, Attribute, and Text classes.
Perhaps -- I may have to give in an allow Ælfred to use more than one
class file; or alternatively, these would be an optional extra, along
with the SAX-J layer.
> And it shouldn't (sorry Peter) be called YAXPAPI - how about SAX, Simple
> API for XML? Maybe SAX-J for the Java bindings. -Tim
How about RUSTY?
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|