XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] [Summary] How can the content of a leaf element bemultiple text nodes?

Hi Roger,
    Interestingly, your topic has XSD 1.1 related implications (as tested with Apache Xerces).

Please consider following XSD 1.1 validation example.

XML instance document:
<?xml version="1.0"?>
<Test>abc<!-- blah -->def</Test>

XSD 1.1 document:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="Test">
       <xs:complexType>
          <xs:simpleContent>
             <xs:extension base="xs:string">
                <xs:assert test="count(text()) = 1"/>
                <xs:assert test="text() = 'abcdef'"/>
             </xs:extension>
          </xs:simpleContent>
       </xs:complexType>
    </xs:element>

</xs:schema>

The above mentioned XSD document, results in valid outcome for the mentioned XML instance document with default options.

But when we provide, the option -acp (i.e, the xs:assert tree shall retain comments and PIs. or, set the feature http://apache.org/xml/features/validation/assert-comments-and-pi-checking to true manually during JAXP driven validation) to Xerces sample jaxp.SourceValidator, the above validation results in invalid outcome. Instead, the following XSD 1.1 document results in valid outcome with the same XML instance document (with the -acp option),

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="Test">
       <xs:complexType>
          <xs:simpleContent>
             <xs:extension base="xs:string">
                <xs:assert test="count(text()) = 2"/>
                <xs:assert test="text()[1] = 'abc'"/>
                <xs:assert test="text()[2] = 'def'"/>
                <xs:assert test="comment() = ' blah '"/>
             </xs:extension>
          </xs:simpleContent>
       </xs:complexType>
    </xs:element>

</xs:schema> 

On Sat, Feb 12, 2022 at 7:09 PM Roger L Costello <costello@mitre.org> wrote:

Thank you again Michael, Ken, and Liam for your outstanding explanations! Here is my summary of all that I learned:

An XML leaf element can contain more than one text node. For example, suppose that a <Test> leaf element contains "abc" and "def" and they are separated by a comment:

<Test>abc<!-- blah -->def</Test>

The <Test> element contains two text nodes:

text[1] = "abc"
text[2] = "def"

A leaf element will never contain two adjacent text nodes. For example, in this <Test> element "abc" and "def" are separated by a space:

<Test>abc def</Test>

There is only one text node, and its value is "abc def"

If "abc" and "def" are separated by a processing instruction (PI):

<Test>abc<?foo test?>def</Test>

then again there are two text nodes.

However, if "abc" and "def" are separated by a CDATA section:

<Test>abc<![CDATA[blah]]>def</Test>

then there is only one text node, and its value is: abcblahdef

The CDATA section is simply a wrapper about text; the wrapper is removed by the XML parser.

If "abc" and "def" are separated by an entity:

<Test>abc&amp;def</Test>

then there is only one text node, and its value is: abc&def

One way to display the text node(s) is to create an XPath expression and then execute the expression. This XPath expression can be used to count the number of text nodes in the <Test> element:

count(Test/text())

This XPath expression can be used to show the content of the first text node:

Test/text()[1]

This can be used to show the content of the second text node (if there is one):

Test/text()[2]

And this can be used to show the sequence of text nodes:

Test/text()

When you execute any of these XPath expressions, you will see a visual representation of the result. That visual representation might be misleading! For example, recall the case where "abc" and "def" are separated by a comment:

<Test>abc<!-- blah -->def</Test>

We now know that the <Test> element contains two text nodes. However, when I executed this XPath expression:

Test/text()

I saw this result:

abcdef

Liam executed (using a different XPath tool) the same XPath expression on the same <Test> element and got this result:

-- NODE --

abc

-- NODE --

def

My XPath tool mislead me into thinking that the <Test> element has only one text node.

Similarly, when I ran the same XPath expression on this <Test> element:

<Test>abc&amp;def</Test>

 

I saw this result:

 

abc&amp;def

 

Again, my XPath tool mislead me into thinking that the XML entity was not resolved (i.e., &amp; was not converted to &). In fact, however, the actual result of executing the XPath expression is this:

 

abc&def

 

The entity is resolved.

 

Important lesson: Distinguish the content of the text node from its visual representation. The XPath spec doesn't say anything about the visual representation.

 

--
Regards,
Mukul Gandhi


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS