OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)

[ Lists Home | Date Index | Thread Index ]


Just got back from taking my daugther to the dentist! (Home from 
college, now making more appointments for the breaks!)

Jeni Tennison wrote:

>Hi Patrick,
>>>The document that you quote is not a normative definition of XML.
>>>There are many normative definitions of data models for XML,
>>>including the Infoset and XPath.
>>And it is luck that they all follow a tree based model?
>Not luck. Firstly, XML syntax is structured to support a tree-based
>model, because one of its well-formedness rules is that elements must
>be nested inside each other. Secondly, a tree-based model is an
>excellent choice for implementations since it makes it easy to keep
>track of context and to focus processing onto particular subtrees;
>these are advantages that are missed by APIs that don't assume a
>tree-based model, such as SAX.
I have not tried to claim that a tree model is not useful. The DOM 
"Lite" example is a case where you have what I suppose would be called a 
partial tree, that uses the tree model but only so far as it is useful 
in a given context. That approach avoids the shortcomings of an approach 
like SAX.

It is the singleness of the tree or well-formedness if you like, for any 
given instance that creates the problem. To use a non-overlapping 
example, consider the DOM example we have been batting back and forth. 
If we accept your view of XML syntax, I can:

1. Process the entire document into a DOM tree, or:

2. Filter the document to remove part of the markup that forms part of 
the tree and then create a DOM tree.

I think we would agree option #1 is the most intensive both in terms of 
processing and memory.

But I don't find #2 all that attractive since I have to reparse the 
document (probably using SAX) to get back to the part of interest that 
still has all the markup in it.

What I am arguing for is:

3. I only recognize in building the DOM tree as much of the tree as I 
need. The markup that forms the remainder of the tree is still present, 
it just passes unremarked into the DOM tree with the PCDATA. Now when I 
get the entry node, I parse only that element and all the markup + 
PCDATA that is found there by recognizing all the markup contained therein.

That seems to me to be a better processing model.

There is only one tree, it is well-formed XML, but I have only 
recognized part of it. I understand the practice of XML processors to be 
a recognition of the entire tree and not part of it. (Yes, XSLT/XQuery 
can recognize subtrees for extraction but here I am positing a 
processing of the entire document and only recognizing part of the tree 
in it.)

>>Probably time to end this particular thread. I was trying to
>>convince you that if everything the W3C has done with XML looks like
>>a tree, then it must have a tree model.
>How about that I agree that XML syntax is designed around a tree
>model, and you agree that a tree model isn't the *only* way to
>interpret a document in XML syntax? After all, as you've shown with
>JITTs, the only thing that gives meaning/structure to a document is
>the processor that's used on it.
Well, ..., I hesitate only because I think you use "XML syntax" to mean 
far more than I do with the term. Or perhaps more accurately, I use the 
term in different senses without clearly indicating how I am using it. 
XML syntax (sense 1): The tokens beginning with stago and etago as 
defined in XML 1.0. Possibly read by a JITTs processor as representing a 
structure to be imposed on a document. Or read by a LMNL processor into 
the LMNL data model. XML syntax (sense 2): The formal requirements for a 
document to be passed to a non-JITTs processor or parser, which are 
enumerated in XML 1.0. 

Yes to the design around "a" tree model and yes, can read XML syntax in 
either of my senses of the word as something other than "a" tree. The 
problem remains that I think you are contending my XML syntax (sense 2) 
controls how XML syntax (sense 1) can be used in a document. (That is 
only the case if we consider the document to have some structure prior 
to processing. Process one of my example files with a JITTs processor 
and if you request it, out comes what appears to be XML syntax in sense 
2. Whether it did or did not have compliance prior to processing in the 
XML syntax 2 sense, does not seem to me to be relevant.)

>>I have failed in that attempt and don't really have any other
>>evidence to offer. (I don't consider a plethora of tree based data
>>models persuasive at all that XML has a one syntax and many data
>That's a fair point, but what about SAX? Or the productions used to
>describe XML syntax in the Recommendation? Neither of those are tree
>models (or at least, the BNF parse tree doesn't follow the same kind
>of tree as the one that you get in the DOM/XPath tree models).
Sax enforces well-formedness of the output and to the extent based upon 
XML parsers, if it thinks it sees an XML document, it enforces 
well-formedness on the input as well. (Don't know if that is required 
but that is the behavior observed.)

The particular tree model does not concern me because a tree is limiting 
when describing data that is not simple, particularly when you get only 

>>This is particularly important in light of your insistance that new
>>models of XML must still comply with the well-formedness strictures
>>of XML 1.0. You can have any model you like with XML, so long as it
>>is tree based. (Or convert to some other model but I don't share
>>your confidence in that strategy.)
>Hmm... I'm not sure how to move forward on this one. I don't think
>that anything will persuade me that a document that isn't well-formed
>is in XML syntax. We'll just have to agree to disagree.
Perhaps so because there are a number of other issues about JITTs (and 
LMNL too) that I think we can productively discuss.

BTW, the Jewels of Stringology book is now set to be published in 
November 2002 (original pub. date was September). Anyone interested in 
Gavin's work or LMNL (and I number myself among them), should consider 
getting your order in now.

Last post for the day, have more reports to get tonight. Will pick up 
again in the morning. (Should have  a new implementation, much more 
robust than the first, but still a filter to post tomorrow as well.)


>Jeni Tennison
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>

Patrick Durusau
Director of Research and Development
Society of Biblical Literature


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS