OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)

[ Lists Home | Date Index | Thread Index ]

Hi Patrick,

>>>>The document that you quote is not a normative definition of XML.
>>>>There are many normative definitions of data models for XML,
>>>>including the Infoset and XPath.
>>>And it is luck that they all follow a tree based model?
>>Not luck. Firstly, XML syntax is structured to support a tree-based
>>model, because one of its well-formedness rules is that elements
>>must be nested inside each other. Secondly, a tree-based model is an
>>excellent choice for implementations since it makes it easy to keep
>>track of context and to focus processing onto particular subtrees;
>>these are advantages that are missed by APIs that don't assume a
>>tree-based model, such as SAX.
> I have not tried to claim that a tree model is not useful.

I know :) I wasn't trying to imply that you had; you asked whether the
fact that most data models that are built from XML documents have a
tree model was "luck", I was explaining that no, of course I don't
think it's just "luck".

> 3. I only recognize in building the DOM tree as much of the tree as
> I need. The markup that forms the remainder of the tree is still
> present, it just passes unremarked into the DOM tree with the
> PCDATA. Now when I get the entry node, I parse only that element and
> all the markup + PCDATA that is found there by recognizing all the
> markup contained therein.
> That seems to me to be a better processing model.

Sure, OK. In other words, you choose to read some markup and ignore
other markup (or rather, read the markup and character data together
into a string, as if it had <![CDATA[...]]> around it). I don't have
any problem with that at all.

>>>Probably time to end this particular thread. I was trying to
>>>convince you that if everything the W3C has done with XML looks
>>>like a tree, then it must have a tree model.
>>How about that I agree that XML syntax is designed around a tree
>>model, and you agree that a tree model isn't the *only* way to
>>interpret a document in XML syntax? After all, as you've shown with
>>JITTs, the only thing that gives meaning/structure to a document is
>>the processor that's used on it.
> Well, ..., I hesitate only because I think you use "XML syntax" to
> mean far more than I do with the term. Or perhaps more accurately, I
> use the term in different senses without clearly indicating how I am
> using it. XML syntax (sense 1): The tokens beginning with stago and
> etago as defined in XML 1.0. Possibly read by a JITTs processor as
> representing a structure to be imposed on a document. Or read by a
> LMNL processor into the LMNL data model. XML syntax (sense 2): The
> formal requirements for a document to be passed to a non-JITTs
> processor or parser, which are enumerated in XML 1.0.

Right. I'm using it purely in terms of sense 2. Sense 1 doesn't count
as "XML syntax" as far as I'm concerned -- it's rather "XML-like
syntax" or "pseudo-XML syntax" or "a syntax that uses angle brackets
like XML does".

> Yes to the design around "a" tree model and yes, can read XML syntax
> in either of my senses of the word as something other than "a" tree.
> The problem remains that I think you are contending my XML syntax
> (sense 2) controls how XML syntax (sense 1) can be used in a
> document. (That is only the case if we consider the document to have
> some structure prior to processing. Process one of my example files
> with a JITTs processor and if you request it, out comes what appears
> to be XML syntax in sense 2. Whether it did or did not have
> compliance prior to processing in the XML syntax 2 sense, does not
> seem to me to be relevant.)

Right -- I think that's where I disagree. I think that it's important
that you're clear what syntax (in sense 2 terms) is used in the
document that acts as the source for your JITTs process because that
source document is the one that some user is going to write in their
favourite text editor. Plus if your process receives something that it
doesn't understand, it probably won't be able to work. So it might not
be *as* relevant as the lovely XML tree that you get out at the end,
but it's still important to the user and the application.

>>>I have failed in that attempt and don't really have any other
>>>evidence to offer. (I don't consider a plethora of tree based data
>>>models persuasive at all that XML has a one syntax and many data
>>That's a fair point, but what about SAX? Or the productions used to
>>describe XML syntax in the Recommendation? Neither of those are tree
>>models (or at least, the BNF parse tree doesn't follow the same kind
>>of tree as the one that you get in the DOM/XPath tree models).
> Sax enforces well-formedness of the output and to the extent based
> upon XML parsers, if it thinks it sees an XML document, it enforces
> well-formedness on the input as well. (Don't know if that is
> required but that is the behavior observed.)

Sure, but as John pointed out, it's a *linear* data model rather than
a *tree* data model -- it translates an XML document into a sequence
of events.

> Perhaps so because there are a number of other issues about JITTs
> (and LMNL too) that I think we can productively discuss.

Definitely :) I think that we're both interested in how to use schemas
(e.g. RELAX NG schemas) to extract trees from overlapping structures,
for example. But we should probably take the discussion over onto
LMNL-Dev rather than talk about it here...



Jeni Tennison


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS