OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Couldn't illegal XML characters be used simply by escaping them?

Hi Folks,

This week I was in a discussion and the topic of illegal XML characters came up and someone asked: "Couldn't illegal XML characters simply be escaped?"

Here is my response. Is it correct? Complete? Easy to understand?

We need to distinguish between a reserved XML character versus an illegal character.

The '<' symbol is a reserved XML character. If data contains that symbol it will confuse an XML Parser because the Parser will think, "Oh, a new element is being started."

For example, consider this:

<Equation>if A < B then ...</Equation>

That '<' symbol needs to be escaped. We can escape it using the built in &lt; entity or the decimal or the hexadecimal value of the symbol. Let's do the latter:

<Equation>if A &#x3C; B then ...</Equation>

Now the XML Parser is not confused into thinking that the XML is trying to start a new element. Note that the XML Parser does resolve the character entity reference and the output of the Parser is this:

<Equation>if A < B then ...</Equation>

We've made it past the Parser, so that '<' symbol no longer a problem.

An important thing to note is that the '<' symbol is (obviously) a legal character.

The XML 1.0 specification lists those characters that may be used in an XML document (see below for a partial list). So some characters cannot be used in XML documents. For example, hex 0 (null) is not a legal XML character.

[Person I was talking to] your suggestion is to escape illegal characters like so:

<Test> Here is a null character: &#x0;</Test>

What will an XML Parser do with that character entity reference? It will resolve it (let (null) represent the null character):

<Test> Here is a null character: (null)</Test>

But now the output of the XML Parser is an XML document that contains an illegal character. Thus an error is thrown.

Recap: reserved characters may be used where they ordinarily would cause confusion by escaping them. But illegal characters may never be used and escaping them does not help.


Decimal value of
US-ASCII character | Is an XML character?
    1              |  No
    2              |  No
    3              |  No
    4              |  No
    5              |  No
    6              |  No
    7              |  No
    8              |  No
    9              |  Yes
   10             |  Yes
   11             |  No
   12             |  No
   13             |  Yes
   14             |  No
   15             |  No
   16             |  No
   17             |  No
   18             |  No
   19             |  No
   20             |  No
   21             |  No
   22             |  No
   23             |  No
   24             |  No
   25             |  No
   26             |  No
   27             |  No
   28             |  No
   29             |  No
   30             |  No
   31             |  No
   32-127    |  Yes

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS