OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Representing interdependence of elements within a schema o

[ Lists Home | Date Index | Thread Index ]

From: "John Rivett-Carnac" <jbrc@chorusconsulting.com>


> Using a separate schema for each of the three states turns out to be a very
> clumsy way of handling things. In the example above it is just about
> manageable but where there are many levels and elements/sub elements it is
> almost impossible to represent. How can I easily define the interdependence
> between all these parts within a single schema or dtd, so that the document
> can move through different states, each state having different rules about
> the presence or absence of the different parts?

Here are four approaches.

1) Use multiple DTDs.

<!ELEMENT doc ((part1, (part2, binder?)?) | part2) >
<!ELEMENT part1 ((body, part1_annex?)?) >
<!ELEMENT part2 ((body, (part2_annex1, part2_annex2?)?)?) >

This just gives the invariants at all the stages.  So it is not so
useful: you then need a final DTD and you then have the management
issues you describe.

2) Use feasible validation

Another approach is to James Clark's Jing validator (it validates the simple
RELAX NG schemas). It supports "feasible validation" where you can write
just one schema with all the constraints of the finished document, but only
checks some of the constraints: element spelling, position, etc but not 
if required elements are present. That does not really model your process.

3) Use stages

Recently on XML-DEV Francis Norton announced a preprocessor
for W3C's XML Schemas, which added a mechanism for "stages". This
switches in and out production rules.

4) The schema language Schematron (soon to be ISO Schematron)
provides "phases" which let you directly group and model constraints
of the kind you mention.

Here is an (untested) Schematron schema for you.  It may seem a
little long, but that is because you can write very specific validation
messages tailored to your task.

<schema xmlns="http://www.ascc.net/xml/schematron";>
    <title>Schema for JBRC</title>

    <phase name="Incomplete">
        <active pattern="noBinder" />
        <active pattern="invariants" />
    </phase>
    <phase name="ReadyToBind">
        <active pattern="partsComplete" />
        <active pattern="invariants" />
    </phase>
    <phase name="Bound">
        <active pattern="invariants" />
        <active pattern="partsComplete" />
        <active pattern="requiredBinder" />
    </phase>

    <pattern id="noBinder">
        <rule context="/doc">
            <report test="binder"
            >In the initial stage, a doc should have no binder</report>
        </rule>
    </pattern>

    <pattern id="partsComplete">
        <rule context="doc">
            <assert test="*[1][self::part1]"
            >The first child of doc must be part1</assert>
            <assert test="*[2][self::part2]"
            >The second child of doc must be part2</assert>
        </rule>
        <rule context="part1">
            <assert test="*[1][self::body]"
            >The first child of part1 must be body</assert>
            <assert test="*[2][self::part1_annex]"
            >The second child of part1 must be part1_annex</assert>
        </rule>
        <rule context="part2">
            <assert test="*[1][self::body]"
            >The first child of part2 must be body</assert>
            <assert test="*[2][self::part2_annex1]"
            >The second child of part2 must be part2_annex1</assert>
            <assert test="*[2][self::part2_annex2]"
            >The third child of part2 must be part2_annex2</assert>
        </rule>
    </pattern>

    <pattern id="requiredBinder">
        <rule context="/doc">
            <assert test="count(binder) = 1"
            >A bound document must have a binder</assert>
        </rule>
    </pattern>


    <pattern id="invariants">
        <rule context="/doc">
            <assert test="count(part1) &lt;=1"
            >A doc may have one part1</assert>
            <assert test="count(part2) &lt;=1"
            >A doc may have one part2</assert>
            <assert test="count(binder) &lt;=1"
            >A doc may have one binder</assert>
        </rule>
        <rule context="part1">
            <assert test="count(body) &lt;=1"
            >A part1 may have one body</assert>
            <assert test="count(part1_annex) &lt;=1"
            >A part1 may have one part1_annex</assert>
            <report test="part1_annex and not(body)">
            >A part1_annex needs to come after a body.</report> 
        </rule>
        <rule context="part2">
            <assert test="count(body) &lt;=1"
            >A part2 may have one body</assert>
            <assert test="count(part2_annex1) &lt;=1"
            >A part2 may have one part2_annex1</assert>
            <assert test="count(part2_annex2) &lt;=1"
            >A part2 may have one part2_annex2</assert>
            <report test="part2_annex1 and not(body)">
            >A part2_annex1 needs to come after a body.</report> 
            <report test="part2_annex2 and not(body)">
            >A part2_annex2 needs to come after a body.</report> 
    </pattern>

</schema>

5) Parallel schemas

You can combine schemas to get the best of both worlds. If you validate
all your documents using the DTD above (you can use DTDs, RELAX NG
or W3C's XML Schemas for this), which just models the invariants, you can 
reduce the Schematron schema to just this, which :

<schema xmlns="http://www.ascc.net/xml/schematron";>
    <title>Schema for JBRC</title>

    <!-- Model the phases the document passes through -->
    <phase name="Incomplete">
        <active pattern="noBinder" />
    </phase>
    <phase name="ReadyToBind">
        <active pattern="partsComplete" />
    </phase>
    <phase name="Bound">
        <active pattern="partsComplete" />
        <active pattern="requiredBinder" />
    </phase>

    <!-- model the constraints -->
    <pattern id="noBinder">
        <rule context="doc">
            <report test="binder" >
                In the initial stage, a doc should have no binder
            </report>
        </rule>
    </pattern>

    <pattern id="partsComplete">
        <rule context="doc">
            <assert test="part1 and part2">
                A doc should have a part1 and a part2
            </assert>
        </rule>
        <rule context="part1">
            <assert test="body and part1_annex">
                A part1 should have a body and a part1_annex
            </assert>
        </rule>
        <rule context="part2">
            <assert test="body and part2_annex1 and part2_annex2">
                A part2 should have a body and a part2_annex1 and a part2_annex2
            </assert>
    </pattern>

    <pattern id="requiredBinder">
        <rule context="doc">
            <assert test="binder">
                A bound document must have a binder
            </assert>
        </rule>
    </pattern>

</schema>

This kind of parallel use becomes very straightforward to write,
because each schema language is being used to express the
things they are good at.   

Cheers
Rick Jelliffe




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS