Lists Home |
Date Index |
- From: firstname.lastname@example.org (Ron Bourret)
- To: email@example.com, SimonStL@classic.msn.com
- Date: Mon, 25 May 1998 14:31:51 +0200
Simon St. Laurent wrote:
> I feel strongly that this project will need an implementation, though I also
> fear that I'm not a good programmer to execute it. I'd like to see the
> implementation built on SAX if possible, to continue the tradition of openness
> it began. I can see something like a 'validating SAX', (vSAX?) a program
> which uses the SAX API to parse a DTD (or whatever we call it) and then uses
> SAX again to parse the document, validating it against the DTD. vSAX would
> then use the same SAX API to pass the information to the routine which called
> it in the first place. Applications already using SAX could call vSAX without
> having to make many changes.
> This may go beyond the capabilities of the event-driven model. Building this
> project in such a way that the vSAX parser could validate documents without
> having to build an entire tree would likely warp the DTDs dramatically. That
> could be interesting, but I suspect vSAX would have to build a tree
I might be getting a bit ahead of the game here, so please bear with me -- these
thoughts are in my head now and I'd like to get them down.
Trees vs. Events
It seems like we need to decide early on whether we are interested in getting
the DTD as events or a tree. Arguing in favor of events is the fact that it is
more reasonable to build a tree from events than vice versa (less memory usage),
so events are the more basic form. However, I also think that what is returned
really depends on intended usage.
In my limited imagination, events are mostly useful for display -- read in the
DTD definition-by-definition and display it. This is a common operation with
the text in an XML document and is presumably why SAX returns events. Except
for displaying a DTD or building a tree, how else would DTD events be used?
The two prime uses of DTDs that I can think of are validation and exploration.
Both of these require the information to stay in memory and be accessed
randomly, which (to me) implies a tree, hash table, or similar structure. Are
there any common uses of DTDs that require serial access?
Flat Trees vs. Tree Trees
If trees are used, another question is what form the tree takes. XML-Data
currently defines a tree that uses XML's hierarchy as a way to group information
about individual elements. However, the relation between those elements is
actually flat. For example, the following DTD converts to the following
<!DOCTYPE a [
<!ELEMENT a (b)>
<!ELEMENT b (#PCDATA)>
<schema id = "a">
<elementType id = "a">
<element type = "#b"/>
<elementType id = "b">
Notice that the definitions of a and b are at the same level. That is, when I
build a DOM tree from this XML, a and b are siblings, not parent and child.
When exploring a DTD, the parent-child relationship is far nicer -- I move up
and down the DOM tree and get the metadata I need at each level. On the other
hand, such a tree complicates the DTD (sorry ;) for XSchema/XTD/etc. and I'm not
sure if representing children with multiple parents would even be possible,
given the strict nesting requirements of XML. Comments?
-- Ron Bourret
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)