XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
DOM versus XDM: Differences in handling CDATA sections, entities,and concurrency

Hi Folks,

My understanding is that an XML document is first processed by an XML parser, which creates an in-memory tree representation of the XML document. Then, an application such as an XML Schema validator or an XSLT processor operates on the in-memory tree representation. Here is a simple graphic I created to show this:

http://www.xfront.com/DOM-versus-XDM/How-an-XML-document-is-processed.gif


It is my understanding that the in-memory model created by different XML parsers may be different, depending on whether the XML parser creates a DOM or XDM in-memory model. 

Here are two places where differences arise:

   - CDATA sections
   - Entities

Also, there are differences with respect to:

   - Concurrent access


------------------------------
CDATA SECTIONS: DOM VERSUS XDM
------------------------------

This XML document contains a CDATA section:

    <root>
        hello <![CDATA[if A < B then ...]]> world
    </root>

As mentioned, there are two ways to model XML documents:

  - Document Object Model (DOM) [1]

  - XML Data Model (XDM) [2]

The two models represent the above XML document differently:

   - A DOM tree will have a node for the CDATA section, as evidenced by 
     the fact that the DOM API has a method for accessing CDATA sections [3]. 
     Here is a graphic I created to show the DOM tree for the XML document:

     http://www.xfront.com/DOM-versus-XDM/DOM-implementation-of-CDATA.gif  

   - An XDM tree does not have a node for the CDATA section. The CDATA
     section is resolved; i.e., the contents of the CDATA section is
     merged with the surrounding text. 
     Here is a graphic I created to show the XDM tree for the XML document:

     http://www.xfront.com/DOM-versus-XDM/XDM-implementation-of-CDATA.gif 


Notice that in the DOM tree there are three nodes under the Element node, whereas in XDM there is only one Text node under the Element node.

------------------------------
ENTITIES: DOM VERSUS XDM
------------------------------

This XML document uses an entity:

    <root>
        hello if A &lt; B then ... world
    </root>

DOM and XDM represent entities differently: 

   - A DOM tree will have a node for the entity, as evidenced by 
     the fact that the DOM API has a method for accessing entities [4]. 
     Here is a graphic I created to show the DOM tree of the XML document:

     http://www.xfront.com/DOM-versus-XDM/DOM-implementation-of-entities.gif   

   - An XDM tree does not have a node for the entity. The entity
     is resolved; i.e., the entity is replaced by its replacement
     text and is merged with the surrounding text. 
     Here is a graphic I created to show the XDM tree of the XML document:

     http://www.xfront.com/DOM-versus-XDM/XDM-implementation-of-entities.gif  


Notice that in the DOM tree there are three nodes under the Element node, whereas in XDM there is only one Text node under the Element node.

---------------------------------
CONCURRENT ACCESS: DOM VERSUS XDM
---------------------------------

There are occasions where multiple applications (processes) need to operate on the same in-memory tree. Recently, Hans-Juergen Rennau reported [5] problems with concurrent access to DOM trees. He found no problems with concurrent access to XDM trees. 

I then learned [6] that the DOM specification does not require implementations provide a thread-safe DOM API; i.e., it does not require that concurrent access to a DOM tree be properly synchronized.


QUESTIONS

1. Is the above description and graphic of how XML documents are processed correct?

2. Is the above description and graphics of the differences in how CDATA sections and entities are represented in DOM and XDM correct?

3. Is the above description of the differences in thread-safety of DOM and XDM correct?

4. Will applications behave differently depending on whether the XML parser it uses generates DOM or XDM? If so, isn't that really bad?

5. Do XML Schema validators use DOM or XDM to represent the XML Schema and the XML instance document?

6. If I were to create my own XML Schema validator, do I have the option of choosing to use DOM or XDM? Or, does the XML Schema specification require me to use one of them? If so, which one?

7. Do XSLT processors use DOM or XDM to represent the XSLT document and the XML instance document?

8. If I were to create my own XSLT processor, do I have the option of choosing to use DOM or XDM? Or, does the XSLT specification require me to use one of them? If so, which one?

9. For each of the following products, does it use DOM or XDM?

XML Schema Validators

   - XERCES: DOM or XDM?

   - SAXON: DOM or XDM?

   - XSV: DOM or XDM?

   - MSXML: DOM or XDM?

   - LIBXML: DOM or XDM?

XSLT Processors

   - XALAN: DOM or XDM?

   - SAXON: DOM or XDM?

   - MSXML: DOM or XDM?

   - XSLTPROC: DOM or XSD?


/Roger


[1] http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html 

[2] http://www.w3.org/TR/xpath-datamodel/ 

[3] http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html#ID-667469212 

[4] http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html#ID-11C98490 

[5] http://sourceforge.net/mailarchive/forum.php?thread_name=4CDBE667.2050400%40saxonica.com&forum_name=saxon-help 

[6] http://sourceforge.net/mailarchive/message.php?msg_name=4CDBE667.2050400%40saxonica.com 




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS