[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Why isn't the semicolon a reserved character?
- From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Sat, 15 Mar 2014 19:33:31 -0400
At 2014-03-15 21:41 +0000, Costello, Roger L. wrote:
This XML document is not well-formed:
<Document>
]]>
</Document>
Why? Because the XML parser see that and thinks that the > symbol
marks the end of a CDATA section;
False. The "]]>" marks the end of a CDATA section:
http://www.w3.org/TR/2008/REC-xml-20081126/#NT-CDEnd
A simple ">" in parsed character data is not a problem when it is not
preceded by two right square brackets. This comes up in my XML
syntax class (which, since December, has been available for streaming
on Pluralsight).
The following is well-formed as the simple greater-than symbol does
not mark the end of a CDATA section:
<?xml version="1.0" encoding="UTF-8"?>
<doc>
This is a > greater-than symbol.
</doc>
the XML parser throws an error since there is no preceding <![[CDATA
To be precise in a way that answers a later question below, it throws
an error because at the point the end of CDATA was encountered it was
not in a CDATA section. Which, BTW, you mistyped ... the start of a
CDATA section is <![CDATA[ per:
http://www.w3.org/TR/2008/REC-xml-20081126/#NT-CDStart
The > symbol must be escaped like so:
<Document>
]]>
</Document>
Now consider the ; symbol. It marks the end of an entity reference.
This is a well-formed XML document:
<Document>
A;B
</Document>
Why doesn't the XML parser see that and think that the ; marks the
end of an entity reference; why doesn't the XML parser throw an
error since there is no preceding & symbol?
Because an entity reference is not a "section" of parsed data ... it
is a concise markup construct. It is easy to detect the end of an
entity reference:
http://www.w3.org/TR/2008/REC-xml-20081126/#NT-EntityRef
Note how the content of an entity reference is a simple name.
The content of a CDATA section is far more complex and so is
described using a wildcard:
http://www.w3.org/TR/2008/REC-xml-20081126/#NT-CData
Note the interesting quirk that within a CDATA section there is no
such thing as an embedded CDATA section ... the following is well-formed:
<?xml version="1.0" encoding="UTF-8"?>
<doc>
This is a <![CDATA[ section <![CDATA[ <![CDATA[ <![CDATA[ <![CDATA[ ]]>
</doc>
CDATA sections are not allowed in attributes, while entity references are.
Parsed character data character data sections are simply "different"
and so are treated different when parsing.
Why isn't the ; symbol a reserved symbol?
What do you mean by "reserved"?
It isn't available as a built-in character entity because it isn't
needed to disambiguate otherwise ambiguous strings found in parsed
character data.
And it just is, as it was in SGML and so is in XML.
I hope this helps.
. . . . . . Ken
--
Public XSLT, XSL-FO, UBL & code list classes: Melbourne, AU May 2014 |
Contact us for world-wide XML consulting and instructor-led training |
Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm |
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/ |
G. Ken Holman mailto:gkholman@CraneSoftwrights.com |
Google+ profile: http://plus.google.com/+GKenHolman-Crane/about |
Legal business disclaimers: http://www.CraneSoftwrights.com/legal |
---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]