OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: Couldn't illegal XML characters be used simply by escaping them?

I don't think the character code � is allowed. It not that it is illegal it's just a character XML does not support. Character codes must match the production for Char, and 0 isn't one of those codes. Search the xml rec for "[WFC: Legal Character]" to get the details.

Char ::=#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Most of the api's that process xml take care of escaping characters when the serialize out xml and un-escaping them when an xml document is read.

-----Original Message-----
From: Costello, Roger L. [mailto:costello@mitre.org] 
Sent: Saturday, November 10, 2012 8:08 AM
To: xml-dev@lists.xml.org
Subject: [xml-dev] Couldn't illegal XML characters be used simply by escaping them?

Hi Folks,

This week I was in a discussion and the topic of illegal XML characters came up and someone asked: "Couldn't illegal XML characters simply be escaped?"

Here is my response. Is it correct? Complete? Easy to understand?

We need to distinguish between a reserved XML character versus an illegal character.

The '<' symbol is a reserved XML character. If data contains that symbol it will confuse an XML Parser because the Parser will think, "Oh, a new element is being started."

For example, consider this:

<Equation>if A < B then ...</Equation>

That '<' symbol needs to be escaped. We can escape it using the built in &lt; entity or the decimal or the hexadecimal value of the symbol. Let's do the latter:

<Equation>if A &#x3C; B then ...</Equation>

Now the XML Parser is not confused into thinking that the XML is trying to start a new element. Note that the XML Parser does resolve the character entity reference and the output of the Parser is this:

<Equation>if A < B then ...</Equation>

We've made it past the Parser, so that '<' symbol no longer a problem.

An important thing to note is that the '<' symbol is (obviously) a legal character.

The XML 1.0 specification lists those characters that may be used in an XML document (see below for a partial list). So some characters cannot be used in XML documents. For example, hex 0 (null) is not a legal XML character.

[Person I was talking to] your suggestion is to escape illegal characters like so:

<Test> Here is a null character: &#x0;</Test>

What will an XML Parser do with that character entity reference? It will resolve it (let (null) represent the null character):

<Test> Here is a null character: (null)</Test>

But now the output of the XML Parser is an XML document that contains an illegal character. Thus an error is thrown.

Recap: reserved characters may be used where they ordinarily would cause confusion by escaping them. But illegal characters may never be used and escaping them does not help.


Decimal value of
US-ASCII character | Is an XML character?
    1              |  No
    2              |  No
    3              |  No
    4              |  No
    5              |  No
    6              |  No
    7              |  No
    8              |  No
    9              |  Yes
   10             |  Yes
   11             |  No
   12             |  No
   13             |  Yes
   14             |  No
   15             |  No
   16             |  No
   17             |  No
   18             |  No
   19             |  No
   20             |  No
   21             |  No
   22             |  No
   23             |  No
   24             |  No
   25             |  No
   26             |  No
   27             |  No
   28             |  No
   29             |  No
   30             |  No
   31             |  No
   32-127    |  Yes


XML-DEV is a publicly archived, unmoderated list hosted by OASIS to support XML implementation and development. To minimize spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS