[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Is Schematron (using XPath 2.0) functionally asuperset of XML Schemas?
- From: Rick Jelliffe <rjelliffe@allette.com.au>
- To: xml-dev@lists.xml.org
- Date: Mon, 12 Nov 2007 18:36:57 +1100
On Sat, 2007-11-10 at 11:58 +0100, Michele Vivoda wrote:
> Hi,
>
> I use the schema also for its data-description
> features: to know which element/attrs are allowed,
> datatype etc, before validation,
> to _produce_ instances, that then I will validate.
> I think that it would be hard to
> get the same description from a list of xpaths,
> or at least I wouldn't know where to start.
Where to start? Well, forget grammars and learn XPaths!
More seriously, apart from the issue of expressive power (which
Schematron w XSLT2 wins AFAICS) I would never dispute that some things
are easier to express in Schematron and some things are easier to
express in XSD; indeed, some things are easier to express using XPaths
and some things are easier to express using grammars. However, you set
yourself a hard task if you are defending XSD because of its
user-friendliness :-)
Why is it particularly harder to write a flat set of declarations
<xsh:pattern>
<sch:rule context="AAA">
...
</sch:rule>
<sch:rule context="AAA/BBB">
...
</sch:rule>
</sch:pattern>
than a nested set?
<xsd:element name="AAA">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="BBB">
...
</xsd:element>
...
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Indeed, if you are processing using XSLT or DOM or most object models,
you are probably using XPaths to locate the information items you are
interested in. You already, quite probably, are XPath processing: why
isn't validation merely double handling? It is not impossible to imagine
a language like SAXON doing dynamic typing of data items by using XPath
matching, rather than implementing an FDA (especially because of that
declaration rule I forgot last week.)
I don't see why validating against an FDA then using XPath matching is
not really a kind of double-handling: lazy type implication using XPaths
would seem to be a respectable implementation technique that in some
situations would give good performance.
> Is it possible to go back from the schematron
> to the w3c schema ?
The area of information modeling, rather than just expression of bags of
constraints, is of course quite important. (Whether it is so important
the it should trump Plain Old Assertions is another matter.)
IMHO Schematron actually has a more powerful story than XSD in regard to
modeling. Now Schematron does have a built-in set of
component-equivalents, based on my view of the problems of DTDs (you
might say that XSD embraced DTDs to the extent of wanting to explicate
them, while I thought they were an inadequate basis.) Hence phases ->
patterns -> rules -> assertions -> diagnostics.
However, the thing that I consider more powerful is the provision of
abstract patterns (and abstract rules). These allow a schema designer to
invent and implement their own modeling system; their own abstractions
which they may check using XPaths or their own software or just leave as
formalized statements that have no computerized checking.
For example, using abstract rules, the following:
<sch:rule abstract="true" id="t2" role="xsd-simpleType" >
<sch:rule extends="t1" />
<sch:assert test=". < 4" role="xsd-facet-minExclusive">
The value should be less than 4
</sch:assert>
</sh:rule>
has all the information needed to generate
<xsd:simpleType name="t2" base="t1">
<xsd:minExclusive value="4" />
</xsd:simpleType>
and to round-trip. You say xsd:tomato and I say sch:tomato. As BR
pointed out, it also has enough information that a type-annotating
system could use this to annotated DOM items with type (and derivation)
information.
For content models, abstract patterns come into play. You could have an
abstract pattern usage such as
<sch:pattern is-a="element-grammar">
<sch:param name="element" value="AAA" />
<sch:param name="grammar" value="' BBB, CCC+, DDD | EEE, FFF? '" />
</sch:pattern>
which could then have a vacuous implementation*
<sch:pattern abstract="true" name="element-grammar">
<sch:rule context=" $element " >
<sch:assert test="true()">The element <sch:name/>
should follow the content model
<sch:value-of select=" $grammar "/>
</sch:assert>
</sch:rule>
</sch:pattern>
So it is not true to say that you cannot represent all the information
in a Schematron schema that would be needed to round-trip to an XSD
schema: it is just a matter of conventions and support software.
Schematron is one step ahead of XSD in that while XSD provides a
container (appinfo) in which a similar mechanism could be constructed,
Schematron provides a way to give such user-defined constructs a name,
parameterize the declarations, give the parameters names, and pass
through user-friendly text.
Interstingly, it is entirely possible to use Schematron abstract
patterns to make up your own schema language, which you may then
implement yourself but not using the Schematron XPaths mechanism as all:
you just use Schematron as a vocabulary for driving your own modeling
application. Schematron has this basic extensibility that XSD (and other
schema languages that don't provide an explicit mechanism) do not.
Can we have our cake and eat it? Can we express the grammar in a way
suitable for round-tripping and also express the constraints in a way
that Schematron can use it to validate? Here is an example of this:
<sch:pattern is-a="element-grammar-1">
<sch:param name="element" value="AAA" />
<sch:param name="grammar" value="' BBB, CCC+, DDD | EEE, FFF? '" />
<xsd:param name="children" value=" BBB | CCC | DDD | EEE | FFF " />
<xsd:param name="children-text" value="' BBB or CCC or DDD or EEE or
FFF " />
</sch:pattern>
and then implement some tests
<sch:pattern abstract="true" name="element-grammar-1">
<sch:rule context=" $element " >
<!-- untested assertion -->
<sch:assert test="true()">The element <sch:name/>
should follow the content model
<sch:value-of select=" $grammar "/>
</sch:
<!-- tested assertion -->
<sch:assert test=" count( $children ) = count(*)">
The element <sch:name/> should only have children from
the following list: <sch:value-of select=" children-text"/>
</sch:assert>
</sch:rule>
</sch:pattern>
Of course, we could have software that automagically generated the extra
parameters from the first form of abstract pattern: the information is
all there.
Anyway, the bottom line is that Schematron allows both the abstract
specification of all sorts of constraints (with a mechanism of named
parameters, which is more than XSD provides) and the specific testing of
many kinds of constraints (using XPaths, which are more powerful than
what XSD provides.) So the gap is in the middle, in developing
conventions and software to implement the home-made schema-models that
users may develop.
XSD uses grammars because XML DTDs used grammars. XML DTDs used grammars
because SGML DTDs used grammars. SGML DTDs used grammars because the
delimiter map could change while parsing an element: short references in
particular. So for SGML, grammars were both a necessity and wonderful
low-hanging fruit. But for XML, we have no technical requirement that
forces us to use grammars to model: we can explore and figure out the
best mechanism for our tasks. Grammars are familiar (though not to DB
people); they are terse (though not in the XSD syntax); they are
powerful (though not in the XSD model); they are easy to implement (but
not with all the XSD extras); but schema != grammar. Any grammar is a
round hole, but sometimes you have a square peg. And XSD is sometimes a
very tight hole indeed.
Cheers
Rick Jelliffe
*
http://www.oreillynet.com/xml/blog/2007/03/expressing_untested_and_untest.html
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]