[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Caution! XML parsers behave differently with whitespacespecified directly in attribute value versus whitespace specified via anENTITY
- From: Christophe Marchand <cmarchand@oxiane.com>
- To: xml-dev@lists.xml.org
- Date: Fri, 8 Apr 2016 14:31:46 +0200
The question is, why, today, as we have various tools to modify xml
files, should we persist to use entities !
Christophe
Le 08/04/2016 13:49, Costello, Roger L. a écrit :
Hi Folks,
I created a schema which declares an element “test” whose value must
be the string: Column#1 tab (hex 9) Column#2:
<xs:elementname="test">
<xs:simpleType>
<xs:restrictionbase="xs:string">
<xs:patternvalue="Column#1	Column#2"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
This XML document conforms to the schema:
<test>Column#1	Column#2</test>
Good.
Next, I decided to do some abstraction: I created an ENTITY for the
tab character and then used the entity in the declaration of the
“test” element:
<!DOCTYPE xs:schema [
<!ENTITY TAB '	'>
]>
<xs:schemaxmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:elementname="test">
<xs:simpleType>
<xs:restrictionbase="xs:string">
<xs:patternvalue="Column#1&TAB;Column#2"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>
I validated the above XML document against the new schema and got this
error message:
The instance document has the content Column#1\tColumn#2,
which does not match the pattern facet Column#1 Column#2.
Huh? My schema didn’t specify a space character between Column#1 and
Column#2.
Michael Kay and Ken Holman filled me in on what’s happening. In the
second schema (the one using the ENTITY) the pattern facet’s attribute
value (Column#1&TAB;Column#2) is being “normalized” by the XML parser.
That is, the tab symbol is being replaced by the space symbol. Oddly,
in the first schema the pattern facet’s attribute value is _not_
normalized. That seemingly arbitrary behavior has to do with an
incomplete specification in the XML specification. [Lessons learned:
(1) Writing a good specification is really, really hard. (2) When
writing a specification you must nail down every last detail.] Michael
Kay explains it this way:
It's called attribute value normalization, and is described in the
XML specification. It's of the bizarreness of XML not
being able
to define consistently whether and when whitespace is
significant.
If you write a newline character entity explicitly in an
attribute
value, then it decides you probably intended it, but if a
newline
gets in there by expanding an entity reference, it decides
that you
probably didn't.
Yikes!
/Roger
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]