OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] DOM versus XDM: Differences in handling CDATAsections, entities, and concurrency

Okay.  A part of my response is misleading (well, and the rest of it is 
nasty, but I can live with nasty more than misleading).

On Fri, 12 Nov 2010 12:36:02 -0500, Amelia A Lewis wrote:
> On Fri, 12 Nov 2010 11:38:56 -0500, Costello, Roger L. wrote:

>> DOM and XDM represent entities differently: 

This is true.

>>    - A DOM tree will have a node for the entity, as evidenced by 
>>      the fact that the DOM API has a method for accessing entities [4]. 

This is sort of true.

>>      Here is a graphic I created to show the DOM tree of the XML document:

That's wrong.


DOM has two node types related to entities.  In the Node interface, 
these types are indicated by ENTITY_NODE and ENTITY_REFERENCE_NODE 
constants, when you ask them Node.getNodeType().

An Entity node will respond to getNodeName() (an entity has name), but 
not to getNodeValue() or getAttributes().  An Entity node will respond 
to getChildNodes().  Note that an Entity node does *not* represent a 
character reference or a reference to a predefined entity.  Entity 
nodes have several additional methods (apart from the foregoing, which 
are defined for Node, the base interface): getInputEncoding(), 
getNotationName(), getPublicId(), getSystemId(), getXmlEncoding(), 
getXmlVersion().  It should be clear from these methods that an entity 
in DOM represents something relatively complex (like notations, or 
external parsed and unparsed entities, or even parameter entities).

From the Javadoc:

"An XML processor may choose to completely expand entities before the 
structure model is passed to the DOM; in this case there will be no 
EntityReference nodes in the document tree.
XML does not mandate that a non-validating XML processor read and 
process entity declarations made in the external subset or declared in 
parameter entities. This means that parsed entities declared in the 
external subset need not be expanded by some classes of applications, 
and that the replacement text of the entity may not be available. When 
the replacement text is available, the corresponding Entity node's 
child list represents the structure of that replacement value. 
Otherwise, the child list is empty."

An EntityReference node will respond to getNodeName() (an entity 
reference has a name, which is the name of the referenced entity), but 
not to getNodeValue() or getAttributes().  It has no extension 
methods.  An EntityReference refers to some defined Entity.

From the Javadoc:

"EntityReference nodes may be used to represent an entity reference in 
the tree. Note that character references and references to predefined 
entities are considered to be expanded by the HTML or XML processor so 
that characters are represented by their Unicode equivalent rather than 
by an entity reference. Moreover, the XML processor may completely 
expand references to entities while building the Document, instead of 
providing EntityReference nodes. If it does provide such nodes, then 
for an EntityReference node that represents a reference to a known 
entity an Entity exists, and the subtree of the EntityReference node is 
a copy of the Entity node subtree. [...]" (the elision represents a 
special case; let's deal with the primary case first)

Note that all of the above is corner case stuff anyway.

Now, if you happen to want to deal with this ... mess ... in XDM, some 
provision is made.

In the XDM, a Document node will respond to two accessors that no other 
node responds to: 

xs:string dm:unparsed-entity-public-id(node, string)
xs:string dm:unparsed-entity-system-id(node, string)

Both of these are actually *defined* on all seven node types, but you 
only get anything useful from the document node (everything else 
returns empty sequence).  These two functions/accessors provide access 
to a property defined on the Document node. unparsed-entities.  Just in 
case it wasn't clear, here again the entities in question are unparsed; 
unlike the DOM, if you wanna do something with them, it's up to you to 
go and parse them.

Now ... possibly this will help.  I have some doubts, but since the 
original posting wanted to draw a distinction between the DOM vs XDM 
handling of entities, it might be worthwhile to have wasted half an 
hour or so looking things up in this fashion.

I'll add that the initial posting shows a fairly severe confusion 
between specification and implementation.  While it is true that the 
specifications in question have different implementation profiles and 
constraints, I'm not at all certain that most of the questions asked 
make any sense outside the context of a specific implementation of each 
of the specifications, in a host language.

(whose random .signature generator appears to be in a *puckish* mood)
Amelia A. Lewis                    amyzing {at} talsever.com
Being your slave, what should I do but tend
upon the hours and times of your desire?
I have no precious time at all to spend,
nor services to do, till you require.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS