Hi Folks, XML is an abstraction. The abstraction is this: an XML document contains data and the data is surrounded (delimited) by markers. More concretely, an XML document contains pairs of start-tags
and end-tags, sandwiched between them is character data and possibly other start-tag, end-tag pairs. The XML abstraction is leaky. The five reserved characters (less-than, greater-than, apostrophe, quote, ampersand) are leaks in the abstraction.
What does it mean for the XML abstraction to leak? It means that you, the user of the XML abstraction, must understand to some extent how software that implements the XML abstraction—the XML processor—works
internally. In particular, you must understand that, unlike all the other 100,000+ Unicode characters that you can put in your XML document, an XML processor treats five characters differently and if you want to use one of these characters in your data
then you must perform magic voodoo (you must escape the characters) to get the XML processor to forget the fact that you are using a reserved character. In other words, you have to understand how the XML processor lexically tokenizes your XML document. That’s
a leak in the abstraction. Leaks are bad. It’s best to create abstractions that don’t leak. Does the XML abstraction leak in other ways? Does the JSON abstraction leak?
Does the Unicode abstraction leak? This issue of leaky abstractions is of great interest to me. If you have examples of other abstractions that leak, would you mind sharing them please? For more info on leaky abstractions, see Joel Spolsky’s article,
The Law of Leaky Abstractions (http://www.joelonsoftware.com/articles/LeakyAbstractions.html)
/Roger |