OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Double escaping

Lauren said “why are entity declarations for < and & double-escaped?” I said “huh?” She said “xml-dev is arguing about it”.  So, hi again.  

Thank you for your contribution to this discussion (and thanks to Lauren!)

This is sort of interesting.  If you go all the way back to the first edition of the spec and look at section 4.6, after the examples there's a one-liner that says:

Note that the < and & characters in the declarations of "lt" and "amp" are doubly escaped to meet the requirement that entity replacement be well-formed.

This is gone, and the paragraph before the examples is re-written, in all subsequent revisions starting with the Second Edition.  It’s  a long time since I’ve been inside an XML parser, and I confess I don’t fully grok why the recommendation says:

<!ENTITY lt     "&#38;#60;">

Rather than just 

<!ENTITY lt "&#60;”>

This is also essentially dictated by the definitions for Replacement text (though this just explains that one should, not why):
[Definition: For an internal entity, the replacement text is the content of the entity, after replacement of character references and parameter-entity references.] [Definition: For an external entity, the replacement text is the content of the entity, after stripping the text declaration (leaving any surrounding white space) if there is one but without any replacement of character references or parameter-entity references.]

or, from 1.0
The literal entity value as given in an internal entity declaration (EntityValue) may contain character, parameter-entity, and general-entity references. Such references must be contained entirely within the literal entity value. The actual replacement text that is included as described above must contain the replacement text of any parameter entities referred to, and must contain the character referred to, in place of any character references in the literal entity value; however, general-entity references must be left as-is, unexpanded. For example, given the following declarations:

Now in fact, this is a “for interoperability” which was code for “to work with SGML parsers”,

That is, indeed, probably the key point here.

and I have never encountered an XML document which actually declares &lt; or &amp;  

It’s a long time ago, but I’m pretty sure I didn’t work on any revision of the spec after the First Edition; I certainly don’t remember the discussion that led up to this change.  By that time, there would have been several seasoned XML Processor implementors in the discussion, and this would presumably reflect their experience.  

Indeed.  Thank you, again, for adding your bit of history.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS