then they are limited to 99,999,999 characters, which is the maximum that SGML allowed (or something like that: TOTALCAP). AFAIK XML documents with this restriction would be
of the theoretical class recursive rather then recusively enumerable.
But to a more general point, the problem underlying Rogers comments are, I think, that considering documents using language-theoretic ideas is
probably not as useful as it may appear:
Take problem 4 above. If we have a document (i.e. it is sitting, finite, in memory), then finding out whether an IDREF matches an ID is an linear O(n) operation (2 passes).
In a way, this gives us a two-stage grammar: an upper language (the document is a sequence of bytes that must fit into virtual memory) and a low language (XML or
whatever) where the first language excludes infinite documents, and some of the theoretical characteristics that attach to them.
SGML allowed a document to state how much capacity it required, and for an implementation to state what capacity it provided: for example, for the number of attribute it could handle.
This is a relic of 1980s capacities, but it is amazing that 30 years later there are still issues in the same kind of area!
Cheers
Rick