XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Couldn't illegal XML characters be used simply by escaping them?

Hi Folks,

This week I was in a discussion and the topic of illegal XML characters came up and someone asked: "Couldn't illegal XML characters simply be escaped?"

Here is my response. Is it correct? Complete? Easy to understand?

We need to distinguish between a reserved XML character versus an illegal character.

The '<' symbol is a reserved XML character. If data contains that symbol it will confuse an XML Parser because the Parser will think, "Oh, a new element is being started."

For example, consider this:

<Equation>if A < B then ...</Equation>

That '<' symbol needs to be escaped. We can escape it using the built in &lt; entity or the decimal or the hexadecimal value of the symbol. Let's do the latter:

<Equation>if A &#x3C; B then ...</Equation>

Now the XML Parser is not confused into thinking that the XML is trying to start a new element. Note that the XML Parser does resolve the character entity reference and the output of the Parser is this:

<Equation>if A < B then ...</Equation>

We've made it past the Parser, so that '<' symbol no longer a problem.

An important thing to note is that the '<' symbol is (obviously) a legal character.

The XML 1.0 specification lists those characters that may be used in an XML document (see below for a partial list). So some characters cannot be used in XML documents. For example, hex 0 (null) is not a legal XML character.

[Person I was talking to] your suggestion is to escape illegal characters like so:

<Test> Here is a null character: &#x0;</Test>

What will an XML Parser do with that character entity reference? It will resolve it (let (null) represent the null character):

<Test> Here is a null character: (null)</Test>

But now the output of the XML Parser is an XML document that contains an illegal character. Thus an error is thrown.

Recap: reserved characters may be used where they ordinarily would cause confusion by escaping them. But illegal characters may never be used and escaping them does not help.

/Roger

Decimal value of
US-ASCII character | Is an XML character?
------------------------------------------
    1              |  No
    2              |  No
    3              |  No
    4              |  No
    5              |  No
    6              |  No
    7              |  No
    8              |  No
    9              |  Yes
   10             |  Yes
   11             |  No
   12             |  No
   13             |  Yes
   14             |  No
   15             |  No
   16             |  No
   17             |  No
   18             |  No
   19             |  No
   20             |  No
   21             |  No
   22             |  No
   23             |  No
   24             |  No
   25             |  No
   26             |  No
   27             |  No
   28             |  No
   29             |  No
   30             |  No
   31             |  No
   32-127    |  Yes


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS