[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Are we losing out because of grammars?
- From: Rick Jelliffe <ricko@allette.com.au>
- To: xml-dev@lists.xml.org
- Date: Fri, 02 Feb 2001 19:36:53 +0800
From: James Clark <jjc@jclark.com>
>Whilst I think the approach used by Schematron is an valuable complement
>to grammar based schemas (obviously I'm personally delighted to see
>XPath getting used for validation), I really find it very hard to take
>seriously the idea that the time has come to completely discard grammars
>in favour of path-based rule systems.
XPath and XSLT are great.
>Let's take a really simple example:
>
><!ELEMENT a (b?, c)>
><!ELEMENT b (#PCDATA)>
><!ELEMENT c (#PCDATA)>
If efficiency and terseness is the criteria, what about:
<pattern>
<rule context="a">
<assert test=
"b[1][next-sibling::c[position()=last()]] or
c[1][position()=last()]" />
</rule>
<rule context="b[* or @*] | c[* or @*]">
<report test="1=1" >Should be empty.</report>
</rule>
</pattern>
This has 5 functioning elements compared to TREX's 6. It only require
looking at the first child. (This is an example of elaborating each path,
which is nasty for larger rules.) But it is not particularly the way I'd
envision people will use Schematron.
This can be pretty printed to give a very direct list of rules about the
schema. I note again that the comprehensibility of a schematron schema comes
not from its paths (though often these are simple) but because there is a
pretty direct path for making everything explicit in simple natural language
statements. If one element can follow another, we can explain "why".
But lets try a different example, quid pro quo. This is a real one, coming
from discussion on how to mark up news stories. The client gave the
following requirement:
"Every news story must have elements to mark up who, what, where, when
and how. There must be one and only one of each in every story. They can
appear anywhere."
This requirement is very easy to express in words. It is also trivially
easy to express in Schematron:
<pattern>
<rule context="/">
<assert test="count(//news:who)=1 and count(//news:what)=1
and count(//new:where)=1 and count(//news:when)=1
and count(//news:how)=1"
>Every news story must have elements to mark up who, what,
where, when and how. There must be one and only one in every
story. They can appear anywhere.</assert>
</rule>
</pattern>
One can take these constraints and add them to any schematron schema without
change and it will work. (If the other schema is closed, then that
is an internal inconsistency, which is a different matter. However,
schematron
schemas are open by default.)
Lets say we add this to a schematron schema for full DOCBOOK. The addition
in Schematron is just a single rule, and it will fit in with all the
constraints already in place. It seems to me that this would cause a
grammar-based schema language to explode, if it could cope at all: XML
Schemas could not cope (if the schema had used <or> groups, and we have to
assume that there are already "<any>" wildcards in place so these new
elements are allowed anywhere.)
>It seems self-evident to me that both grammars and path-based rule
>systems have their place. Some problems can be solved most conveniently
>with just grammars, some most conveniently with just path-based rule
>systems and some most comveniently with a combination. Can't we just
>leave it at that? What is the point of this crusade against grammars?
Oh, it is no crusade against grammars. My point all along has been that
there has never been any discussion which establishes grammars are best or
don't cause more problems (for people implementing ad hoc editing systems
for example) than they solve. "A life unexamined is not worth living."
James, Murata-san and I (through my work with XML Schemas) are merrily
foisting grammar-based systems on the world; people may perhaps expect that
there is some particular reason, perhaps so obvious it never has needed to
be stated, why grammars should be regarded by technocrats (even those
outside Asia) as the best approach to take in precedence to others. I don't
mean James or Murata-san need to justify why they take an approach, though I
think the XML Schema WG perhaps should. I am suggesting that the real reason
for this global phenomenon is closer to "err we haven't tried or considered
anything else."
There are now three credible second-generation grammar-based XML schema
languages. If we include the first generation (SOX 1 &2, DDML, XML Data &
XDR, DSD, etc) that gives us perhaps 10 grammar-based schema languages. If
the use of the grammar:rules paradigm is over 10:2 (I include XLinkIt with
Schematron) and implementation is 50:1 then people may draw the conclusion
that most experts and implementors think that grammars are better than rules
for schema languages, rather than that implementors are trying to find the
most general and powerful replacement for DTDs within the grammar framework
*without* considering the underlying issue of what is best.
Cheers
Rick Jelliffe