RE: [xml-dev] Victory has been declared in the schema wars ...

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Rick Jelliffe" <rjelliffe@allette.com.au>
To: "Costello, Roger L." <costello@mitre.org>
Date: Thu, 30 Nov 2006 00:19:55 +1100 (EST)

Costello, Roger L. said:
> Rick Jelliffe wrote:
>
> [Assertion] grammars are a bad foundation

> [Assertion] Schemas should be based on paths

> These are two powerful assertions Rick.
>
> Would you explain why grammar-based languages (e.g., DTD, XSD, RNG) are
> a bad foundation?  A grammar-based language tells an instance document
> author what tags are allowed, how those tags may be arranged, and the
> datatypes of the data.  Isn't that what we want from a schema language?
> Shouldn't that information serve as the foundation for a schema?
>
> Also, would you explain why paths (XPaths) provide a more suitable
> foundation?

OK

> [Assertion] grammars are a bad foundation

1) They encourage or allow implicit structures (repetition groups are an
unmarked-up structure) and so are antagonistic to the fight for tagging:
explicit, simple, natural-language labels for structure down to a certain
grain.

2) They divorce constraints from the business reason from the constraints,
and so are antagonistic to the goals of transparency: constraints and
capabilities are costs (and opportunities) that need to be justifiable
against business requirements.

3) They are too weak to model non-regular patterns, and so are
antagonistic to the goal of being able to represent arbitrary data that
can be in any arbitrary graph: with grammars* you have to abandon

> [Assertion] Schemas should be based on paths

1) They don't have the problems above

2) Processing XML very often involves XPath processing or transformations
currently. One reason some people find some XML processing pipelines to be
too heavy weight is that when you first validate against a state-machine
then use XPath matching to transform the document, you are in effect
repeating pattern-matching using two passes (with different technologies)
where it could all be done in one pass. (Indeed it opens up the door for
local validation modes, where you only validate the elements that are
actually used in a transformation.)

3) If the things that grammars can express that paths cannot is, as it
seems to me, to often be a unhelpful set, while the things that paths can
express that grammars cannot are a very helpful set, then consolidation
should take place in the direction of paths.

4) Path expression implementations seem to be smaller than XML Schema
impleemntation, especially if we just talking about the streaming subset
where you don't include the extended forward axes: descendant,
following-sibling and so on.

---------------
What about Schematron and paths for uses beyond simple validation?

As far as the idea that you need a grammar to do storage type annotation,
it is perfectly feasible to have a pattern like this:

  <sch:rule context="address/postcode"  xxx:type="xs:short">
    <sch:assert test="number(.) &gt; 1000">
     The post code should be greater than 1000
    </sch:assert>
  <sch:rule>

Or that you need a grammar to do semantic annotation (there is already a
role attribute in Schematron):

  <sch:rule context="address/postcode"  role="AustralianPostalCode" >
    <sch:assert test="number(.) &gt; 1000">
     The post code should be greater than 1000
    </sch:assert>
  <sch:rule>

Note that using paths does make completeness-checking a function of the
schema-creation environment rather than being a necessary property of the
schema. However, it also means that open or partial schemas are simpler to
model.

Note also that it is possible to have declarative labeling of patterns
that can allow optimized evaluation. Use abstract patterns something like
this (sorry untested)

 <sch:pattern abstract="true" name="ALL">
   <sch:rule context=" $parent">
      <sch:assert test="count( $child ) = count(*)">
      Only the elements in <sch:value-of select=" $child "/> are allowed
      </sch:assert>
   </sch:rule>
   <sch:rule context=" $child ">
      <sch:assert test=" not(parent::*[name() = $parent ) or
          count( ../*[name() = current()/name()) &lt;= 1">
       The following elements can appear at most once
        <sch:value-of select=" $child "/>
      </sch:assert>
   </sch:rule>
</sch:pattern>

<sch:pattern is-a="ALL">
   <sch:param name="parent"  value="  xxx:address "/>
   <sch:param name="child"   value=" number | street | town | state |
country" />
</sch:pattern>

where the idea is that formulating the abstract patterns for an
implementation is an expert task (e.g. for a vendor or consortium) while
using the abstract pattern is more a schema developer task.

Cheers
Rick Jelliffe

(Of course, even regular grammars can be made more powerful by making each
term contain an XPath and having some kind of axis iteration for each
step.

I described this in 1999 as "Axis Expressions" and RELAX NG took it up as
far as adding attributes to content models; but no-one has ever taken it
to its logical extreme and allowed any XPath expression as the particle of
a content model...perhaps the nearest is Phillipe P's ALS schema language
(is that what it is called? Yikes my memory is so bad today...the
interesting French one with explicit if clauses in content models.) Viewed
in the terms of Axis Expression, currently we have a choice between
powerful grammars over a basic axis (DTD, XSD, RELAX NG just on the child
and preceding-sibling::*[1] axes) or very basic grammar (if we treat the
Schematron elements as a kind of grammar that sequences constraints) with
very powerful axes.)

Follow-Ups:
- RE: [xml-dev] Victory has been declared in the schema wars ...
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] Victory has been declared in the schema wars ...
  - From: Philippe Poulard <Philippe.Poulard@sophia.inria.fr>

References:
- Re: [xml-dev] Victory has been declared in the schema wars ...
  - From: Rick Jelliffe <rjelliffe@allette.com.au>
- RE: [xml-dev] Victory has been declared in the schema wars ...
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]