OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are we losing out because of grammars?



Whilst I think the approach used by Schematron is an valuable complement
to grammar based schemas (obviously I'm personally delighted to see
XPath getting used for validation), I really find it very hard to take
seriously the idea that the time has come to completely discard grammars
in favour of path-based rule systems.

Let's take a really simple example:

<!ELEMENT a (b?, c)>
<!ELEMENT b (#PCDATA)>
<!ELEMENT c (#PCDATA)>

or as a TREX pattern:

<element name="a">
  <optional>
    <element name="b">
      <anyString/>
    </element>
  </optional>
  <element name="c">
    <anyString/>
  </element>
</element>

Let's try and write a Schematron schema that accepts exactly this
grammar.  I'm no Schematron expert, but here's my naive attempt (actual
text of error messages omitted):

<schema>
  <pattern>
   <rule context="a">
     <assert test="count(c)">...</assert>
     <assert test="count(*) = count(b|c)">...</assert>
     <assert test="count(c) &lt;= 1">...</assert>
     <assert test="count(b) &lt;= 1">...</assert>
     <assert test="b[preceding-sibling::c]">...</assert>
     <assert test="@*">...</assert>
   </rule>
   <rule context="b|c">
     <assert test="*">...</assert>
     <assert test="@*">...</assert>
   </rule>
  </pattern>
</schema>

I do not understand how this can possibly be considered an improvement
over the grammar.  For this case, the Schematron schema seems much
harder to write. Even for this trivial case, it's not obvious whether it
accurately captures my intent. Moreover, any current implementation is
likely to be much less efficient than the corresponding grammar based
approach; it will need random access to the instance document, and will
make probably about 6 or 7 passes over the children of the "a" element
in order to validate it.  By constrast a grammar-based implementation
can do it in a single, streaming pass.  Maybe some future Schematron
implementation will be able to optimize it, but that's just speculation,
whereas efficient implementation of grammars is well-established.

It seems self-evident to me that both grammars and path-based rule
systems have their place.  Some problems can be solved most conveniently
with just grammars, some most conveniently with just path-based rule
systems and some most comveniently with a combination.  Can't we just
leave it at that?  What is the point of this crusade against grammars?

James