Modeling ER schemas using Schematron [corrected]

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Rick Jelliffe" <rjelliffe@allette.com.au>
To: "Rick Jelliffe" <rjelliffe@allette.com.au>
Date: Thu, 30 Nov 2006 12:38:19 +1100 (EST)

[Re-send with minor corrections, sorry.]
[Roger asked why I think paths/Schematron is better than grammars/XSD.
Here is a more concrete example of how it can be more declarative and so
better for retargeting, more flexible for modeling, and how it doesn't
necessarily impose a different conceptual step the way that grammars can.]

ISO Schematron introduces a macro layer, "abstract patterns", that allows
higher-level specification of constraints in Schematron. (There is a
pre-processor available that works with Schematron 1.6 for this.)

This allows us to directly convert from, say, an ER diagram into
Schematron. You don't need to go through grammars; you can avoid the
ridiculous situation where you make one set of ER diagrams for your data
model, then you have to make another set of diagrams for the XML using
your XML schema IDE.

Plus, you can have a schema where you don't care which kind of
serialization strategy was used: you can support multiple strategies. For
example, in the following mini-example, fields with a one-to-one relation
can be nested or they can be linked using an ID-like mechanism. XSD is not
smart enough to allow this kind of alternative mechanism: this forces
people to make a decision about the serialization strategy: this creates
incompatibility because different people make different choices.

Using this mechanism, you can get a complete separation from the
declarative portions (which can have as much additional declarative
information as you like) and the operational/implementation code.

<sch:pattern is-a="ENTITY" >
   <sch:param name="name" value="Address" />
</sch:pattern>

<sch:pattern is-a="FIELD">
   <sch:param name="entity" value="Address"/>
   <sch:param name="name" value="Street"/>
   <sch:param name="type" value="xs:string"/>
   <sch:param name="required" value="true" />
</sch:pattern>

<sch:pattern is-a="FIELD">
   <sch:param name="entity" value="Address"/>
   <sch:param name="name" value="Town"/>
   <sch:param name="type" value="xs:string"/>
   <sch:param name="required" value="true" />
</sch:pattern>

<sch:pattern is-a="FIELD">
   <sch:param name="entity" value="Address"/>
   <sch:param name="name" value="Postcode"/>
   <sch:param name="type" value="xs:short"/>
   <sch:param name="required" value="false" />
</sch:pattern>

<sch:pattern is-a="ONE-TO-ONE-RELATION">
   <sch:param name="from" value="Person"/>
   <sch:param name="to" value="Address"/>
</sch:pattern>

How easy is that?  And, in particular, in what way is that more
complicated than XML Schemas?

This kind of declaration is very declarative, IYSWIM. Very easy to use for
other purposes. In fact, it means that using Schematron syntax you can
model your information using extensible collections of name-value pairs
(can someone say "tuple"?) including metadata that you won't be using in
any assertions. You use Schematron abstract patterns to capture all the
data and metadata about some information, then you decide which of that
information you want to make assertions about and the metadata, being
captured, is available in the most convenient form for other XML
processes.

--------------------

The implementation can be really complex, because it is not necessarily
something that ordinary users would be required to understand. They can
just fill in the forms for the various kinds of forms, like above.

The implementation of the abstract patterns might be something like this
(there is probably some casting required for strings and names, but this
is enough to give the idea):

<sch:pattern name="ENTITY"  abstract="true">
  <sch:rule context="/">
    <sch:assert test="true()">
      (We don't make an assertions about an entity.)
    </sch:assert>
  </sch:rule>
</sch:pattern>


<sch:pattern name="FIELD"  abstract="true">
  <sch:rule context=" $entity ">
    <sch:assert test=" boolean( $required ) = false or $name ">
    A <sch:name /> has a field <sch:value-of select=" $name "/>.
    (Fields are always serialized to XML as subelements.)
    </sch:assert>
  </sch:rule>
</sch:pattern>


<sch:pattern name="ONE-TO-ONE-RELATION"  abstract="true">
  <sch:rule context=" $from ">
    <sch:assert test=" $to or attribute::*[name() = $to ] ">
    There is a one-to-one relation from <sch:name /> and
    </sch:value-of select=" $to "/>  (This may be expressed in
    XML by using a subelement or by using an ID with the same
    name as the entity pointed to.)
    </sch:assert>

    <sch:assert test="count( $to | attribute::*[name() = $to ]) &lt;= 1 ">
    A one to one relation only allows a single child element or attribute.
    </sch:assert>

    <sch:assert test=" not(attribute::*[name() = $to ]) or
        //*[name() = $to]/attribute::*[name() = $from]
                     = current()/attribute::*[name() = $to ] ">
    If a one-to-one relation is serialized in XML using a link, then
    there should be a element somewhere in the document with the name
    of <sch:value-of select=" $to "/> which has an attribute called
    <sch:value-of select=" $from "/> which has the same value (e.g. an ID)
    as the value of the <sch:value-of select=" $to "/> attribute on the
    <sch:value-of select=" $from "/> element.
   </sch:assert>

  </sch:rule>

</sch:pattern>

As I mentioned before, providing the definitions is a guru/vendor task.
Using the abstract patterns is trivial form-filling.

----------------------------------

Also, note that because we have been so declarative, we could actually
convert the top definitions of our address schema to XSD even, at a pinch,
by simple transformation.

And we can change the serialization strategy for our data just by changing
the definitions of the abstract pattern, without touching the declarations
for the individual ER components. Want to allow any field to be an
attribute? Just change one line
    <sch:assert test=" boolean( $required ) = false or $name ">
to

    <sch:assert test=" boolean( $required ) = false or $name
            or attribute::*[name() = $name] ">

Schematron allows us to model the serialization strategy independent of its
uses, in a way that leaves XSD's substitution groups in the dust.

Cheers
Rick Jelliffe

References:
- Re: [xml-dev] Victory has been declared in the schema wars ...
  - From: Rick Jelliffe <rjelliffe@allette.com.au>
- RE: [xml-dev] Victory has been declared in the schema wars ...
  - From: "Costello, Roger L." <costello@mitre.org>
- RE: [xml-dev] Victory has been declared in the schema wars ...
  - From: "Rick Jelliffe" <rjelliffe@allette.com.au>
- Re: [xml-dev] Victory has been declared in the schema wars ...
  - From: Philippe Poulard <Philippe.Poulard@sophia.inria.fr>
- Modeling ER schemas using Schematron
  - From: "Rick Jelliffe" <rjelliffe@allette.com.au>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]