OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Why the double escape for lt ? (that is <!ENTITY lt "&#38;#60;">)

Hello my dear XML friends,

In the XML spec, it says that the entities lt and amp must be declared (as an internal entity) with double-escaping.  e,g,
<!ENTITY lt "&#38;#60;”>
I see that this follows from the manner that the replacement text for internal entities is defined (that internal entities have character references expanded, but external entities do not).

This leads to two questions for me:
1) Why are internal and external entities’ replacement text handled differently?
2) What is an example of the case where, if the double-escaping wasn’t done, you would get an non-well-formed result (as implied by 4.6 ¶ 2)?

Thank you,



From: https://www.w3.org/TR/2006/REC-xml11-20060816/ 
Red highlighting added by me to emphasize the key points above.

4.5 Construction of Entity Replacement Text

In discussing the treatment of entities, it is useful to distinguish two forms of the entity's value. [Definition: For an internal entity, the literal entity value is the quoted string actually present in the entity declaration, corresponding to the non-terminal EntityValue.] [Definition: For an external entity, the literal entity value is the exact text contained in the entity.] [Definition: For an internal entity, the replacement text is the content of the entity, after replacement of character references and parameter-entity references.] [Definition: For an external entity, the replacement text is the content of the entity, after stripping the text declaration (leaving any surrounding white space) if there is one but without any replacement of character references or parameter-entity references.]

4.6 Predefined Entities

[Definition: Entity and character references may both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (ampltgtaposquot) is specified for this purpose. Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references "&#60;" and "&#38;" may be used to escape < and & when they occur in character data.]

All XML processors must recognize these entities whether they are declared or not. For interoperability, valid XML documents should declare these entities, like any others, before using them. If the entities lt or amp are declared, they must be declared as internal entities whose replacement text is a character reference to the respective character (less-than sign or ampersand) being escaped; the double escaping is required for these entities so that references to them produce a well-formed result. If the entities gtapos, or quot are declared, they must be declared as internal entities whose replacement text is the single character being escaped (or a character reference to that character; the double escaping here is optional but harmless). For example:

<!ENTITY lt     "&#38;#60;">
<!ENTITY gt     "&#62;">
<!ENTITY amp    "&#38;#38;">
<!ENTITY apos   "&#39;">
<!ENTITY quot   "&#34;">

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS