Lists Home |
Date Index |
Just got back from taking my daugther to the dentist! (Home from
college, now making more appointments for the breaks!)
Jeni Tennison wrote:
>>>The document that you quote is not a normative definition of XML.
>>>There are many normative definitions of data models for XML,
>>>including the Infoset and XPath.
>>And it is luck that they all follow a tree based model?
>Not luck. Firstly, XML syntax is structured to support a tree-based
>model, because one of its well-formedness rules is that elements must
>be nested inside each other. Secondly, a tree-based model is an
>excellent choice for implementations since it makes it easy to keep
>track of context and to focus processing onto particular subtrees;
>these are advantages that are missed by APIs that don't assume a
>tree-based model, such as SAX.
I have not tried to claim that a tree model is not useful. The DOM
"Lite" example is a case where you have what I suppose would be called a
partial tree, that uses the tree model but only so far as it is useful
in a given context. That approach avoids the shortcomings of an approach
It is the singleness of the tree or well-formedness if you like, for any
given instance that creates the problem. To use a non-overlapping
example, consider the DOM example we have been batting back and forth.
If we accept your view of XML syntax, I can:
1. Process the entire document into a DOM tree, or:
2. Filter the document to remove part of the markup that forms part of
the tree and then create a DOM tree.
I think we would agree option #1 is the most intensive both in terms of
processing and memory.
But I don't find #2 all that attractive since I have to reparse the
document (probably using SAX) to get back to the part of interest that
still has all the markup in it.
What I am arguing for is:
3. I only recognize in building the DOM tree as much of the tree as I
need. The markup that forms the remainder of the tree is still present,
it just passes unremarked into the DOM tree with the PCDATA. Now when I
get the entry node, I parse only that element and all the markup +
PCDATA that is found there by recognizing all the markup contained therein.
That seems to me to be a better processing model.
There is only one tree, it is well-formed XML, but I have only
recognized part of it. I understand the practice of XML processors to be
a recognition of the entire tree and not part of it. (Yes, XSLT/XQuery
can recognize subtrees for extraction but here I am positing a
processing of the entire document and only recognizing part of the tree
>>Probably time to end this particular thread. I was trying to
>>convince you that if everything the W3C has done with XML looks like
>>a tree, then it must have a tree model.
>How about that I agree that XML syntax is designed around a tree
>model, and you agree that a tree model isn't the *only* way to
>interpret a document in XML syntax? After all, as you've shown with
>JITTs, the only thing that gives meaning/structure to a document is
>the processor that's used on it.
Well, ..., I hesitate only because I think you use "XML syntax" to mean
far more than I do with the term. Or perhaps more accurately, I use the
term in different senses without clearly indicating how I am using it.
XML syntax (sense 1): The tokens beginning with stago and etago as
defined in XML 1.0. Possibly read by a JITTs processor as representing a
structure to be imposed on a document. Or read by a LMNL processor into
the LMNL data model. XML syntax (sense 2): The formal requirements for a
document to be passed to a non-JITTs processor or parser, which are
enumerated in XML 1.0.
Yes to the design around "a" tree model and yes, can read XML syntax in
either of my senses of the word as something other than "a" tree. The
problem remains that I think you are contending my XML syntax (sense 2)
controls how XML syntax (sense 1) can be used in a document. (That is
only the case if we consider the document to have some structure prior
to processing. Process one of my example files with a JITTs processor
and if you request it, out comes what appears to be XML syntax in sense
2. Whether it did or did not have compliance prior to processing in the
XML syntax 2 sense, does not seem to me to be relevant.)
>>I have failed in that attempt and don't really have any other
>>evidence to offer. (I don't consider a plethora of tree based data
>>models persuasive at all that XML has a one syntax and many data
>That's a fair point, but what about SAX? Or the productions used to
>describe XML syntax in the Recommendation? Neither of those are tree
>models (or at least, the BNF parse tree doesn't follow the same kind
>of tree as the one that you get in the DOM/XPath tree models).
Sax enforces well-formedness of the output and to the extent based upon
XML parsers, if it thinks it sees an XML document, it enforces
well-formedness on the input as well. (Don't know if that is required
but that is the behavior observed.)
The particular tree model does not concern me because a tree is limiting
when describing data that is not simple, particularly when you get only
>>This is particularly important in light of your insistance that new
>>models of XML must still comply with the well-formedness strictures
>>of XML 1.0. You can have any model you like with XML, so long as it
>>is tree based. (Or convert to some other model but I don't share
>>your confidence in that strategy.)
>Hmm... I'm not sure how to move forward on this one. I don't think
>that anything will persuade me that a document that isn't well-formed
>is in XML syntax. We'll just have to agree to disagree.
Perhaps so because there are a number of other issues about JITTs (and
LMNL too) that I think we can productively discuss.
BTW, the Jewels of Stringology book is now set to be published in
November 2002 (original pub. date was September). Anyone interested in
Gavin's work or LMNL (and I number myself among them), should consider
getting your order in now.
Last post for the day, have more reports to get tonight. Will pick up
again in the morning. (Should have a new implementation, much more
robust than the first, but still a filter to post tomorrow as well.)
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To subscribe or unsubscribe from this list use the subscription
Director of Research and Development
Society of Biblical Literature