[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [xml-dev] Debating "civil disobedience" against overly complicatedspecs
- From: Rick Jelliffe <email@example.com>
- To: firstname.lastname@example.org
- Date: Mon, 24 Sep 2001 20:43:58 +1000
From: "Eric van der Vlist" <email@example.com>
> Noah Mendelsohn made a good point  noting that any application of XML
> is defining a subset of XML.
>  http://lists.w3.org/Archives/Public/xml-dist-app/2001Sep/0206.html
> When you think about it, specifying a schema (with a DTD, a W3C XML
> Schema or any other language) is defining the subset that your
> application will allow within all the possible XML documents.
There are differences between
* what markup a parser will accept (i.e., WF XML)
* what information a parser will report (i.e., the information set)
* what information an application can use (i.e., the document type).
It is a natural tendency to, when we find that our document type does not
need certain features, to want to remove them from the information set
or from what the parser recognises. I don't see anything wrong with
a parser implementing only the markup recognition that a particular
application needs, as long as it keeps the layering clear: that it rejects,
say, attributes not because of an XML error but because of an early-caught
Indeed, I guess there might be some efficiency in a schema system acting
as a factory for creating optimal parsers (indeed, this is perhaps what
SGML systems did for delimiters, and what GROVE ideas tried to
allow for the infoset) but the cost in complexity is quite high.
If an application rejects <a b="c"/> because is does not understand
attributes, then it should say "attributes are not valid in the document
types I understand" rather than "syntax error: not well-formed according
to the subset of XML I understand". (The problem, of course, is
that by the time one implemented error-reporters which were smart enough
to know that the problem was caused by attributes, one may just as
well have provided the attribute parsing in the first place. )