OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] XML basics

On Tue, 1 Mar 2011 08:07:49 +0000, Joe Fawcett wrote:
> Thanks for your comments, can you suggest a good term for a generic building
> block of XML if I can't use the term 'node'?

There really isn't one that's generally agreed upon.  That's true for 
'node', btw.  Once you've created a tree, there's relatively little 
argument that an element is a 'node', but, just for instance, in the 
XQuery Data Model, it's possible to treat namespaces as not-nodes 
(detail isn't all that relevant here, I think).  Is text content a 
node?  How about ignorable whitespace?  Comments, processing 
instructions?  Entity references?  The XML declaration?  The internal 
subset?  What about the contents of the internal subset (which aren't 
quite XML, but do have those familiar pointy brackets).  SAX has a 
characters() event--how many of those make up a node?  In the DOM, 
there are namespace attributes (which are nodes); other APIs treat 
namespaces and attributes as disjoint sets (and the XDM permits 
namespace bindings to be treated as something approximately like 
metadata on the tree, with no nodes available to navigate to).

Is a document a node?  What's a document, then?  Is an external parsed 
entity a node?

DOM has 15 'node' types; the infoset has 11; XDM has 7 (this from 
memory, so I might have fudged a number or two, but the point should 
remain: the degree of variance indicates a rather slippery term, which 
means that it's up to you to define what you mean by it).

For a book on XML basics, you might reasonably say that a common 
programmatic representation of XML syntax in memory is as a tree of 
nodes, but unless you want to descend into the swamp (keep in mind that 
the XML Infoset spec came along after DOM and SAX and XPath and 
attempted to unify these three very different models, along with other 
inputs), it might be best to then innocently mention that what 
syntactic elements define a node is not well-defined.

If you do achieve a definition ... what will it be?  'Node' in common 
usage indicates participation in a graph--nodes and edges, nodes and 
connections.  But (according to some very popular APIs) there are nodes 
that are not the children of their parents.  There are also nodes that 
are not visible in the syntax (if you accept that namespaces define 
nodes this is easy to show: xml:lang="en_US" with no xmlns:xml 

The preferred programmatic and algorithmic representations of XML vary 
both by usage and by the predilections of API designers, and a number 
of terms (notably including 'node') are overloaded.

The *syntax* is core; it's well-defined by a fairly terse collection of 
BNF in the base specification (which is usually amended by including 
the namespaces spec, to our sorrow, as I have come to think).  How that 
information is defined for programmatic examination and manipulation 
varies pretty widely, even among the W3C-produced specifications for 
XML, and even keeping to a limited set of implementation languages.  An 
event-oriented API (SAX or StAX in Java, for instance) is a reasonable 
next step.  You probably don't want to ignore tree models in a book on 
basics, but ... arm yourself, for there be dragons.

Amelia A. Lewis                    amyzing {at} talsever.com
The less I seek my source for some definitive, the closer I am to fine.
                -- Indigo Girls

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS