Lists Home |
Date Index |
I suppose one can conceptually divide the entire information content of any namespace-well-formed XML document into two parts:
- an infoset;
- everything that is not in the infoset.
I'll call the latter "the non-infoset (part)" of the XML document.
The non-infoset includes such things as:
- the use of general entities;
- the choice between apostrophe and quote as attribute value delimiter;
- the use of numeric character references vs. literal unicode characters or (sometimes) predefined entity references;
- the nature and amount of whitespace inside tags;
- the choice between a start tag / end tag pair and an empty element tag;
- the inclusion of a DTD;
- the order of attributes (including the order of namespace attributes);
.. and other things, including those listed in Appendix D of the Infoset recommendation.
I am *not* trying to suggest that the non-infoset of an XML document is not important (in an absolute sense). I understand that, given an existing XML document, in many cases one may want all that information to be preserved. (I am *not* advocating a simplification of XML.)
However, I would like to ask the following question. How many **producers** of XML documents really care about the non-infoset part of the XML document that they are **producing**?
Or equivalently: How many **producers** of XML would be happy to live in a world in which the non-infoset part was put outside their control? Say, a world in which they weren't able to choose the attribute value delimiter, or the use of numeric character references? How many producers need the ability to use general entities? How many producers need the ability to include a DTD in the XML they are creating?
If the answer to these questions was that not many XML producers really care about the non-infoset, one could conclude that the XML infoset **is** important, more so than some believe, whether or not people actually use the term "infoset" in their work.