Double escaping

Lauren said “why are entity declarations for < and & double-escaped?” I said “huh?” She said “xml-dev is arguing about it”. So, hi again.

This is sort of interesting. If you go all the way back to the first edition of the spec and look at section 4.6, after the examples there's a one-liner that says:

Note that the < and & characters in the declarations of "lt" and "amp" are doubly escaped to meet the requirement that entity replacement be well-formed.

This is gone, and the paragraph before the examples is re-written, in all subsequent revisions starting with the Second Edition. It’s a long time since I’ve been inside an XML parser, and I confess I don’t fully grok why the recommendation says:

<!ENTITY lt "&#60;">

Rather than just

<!ENTITY lt "<">

Now in fact, this is a “for interoperability” which was code for “to work with SGML parsers”, and I have never encountered an XML document which actually declares < or &

It’s a long time ago, but I’m pretty sure I didn’t work on any revision of the spec after the First Edition; I certainly don’t remember the discussion that led up to this change. By that time, there would have been several seasoned XML Processor implementors in the discussion, and this would presumably reflect their experience.