[
Lists Home |
Date Index |
Thread Index
]
From: "Doug Ransom" <Doug.Ransom@pwrm.com>
> I have been thinking that it would be an interesting exercise to implement a
> compiler that transforms XML Schema into Schematron (probably in XSLT).
You might start by processing your instance document with Francis Norton's
TypeTagger XSLT scripts, which add the type names (of elements) to
an instance. That will simplify life a lot, because then the Schematron
schema has access to type information from its XPath 1.0.
For translating an XML Schema into Schematron, you need to implement
three parts:
1) Keyref and uniqueness. I think this would be very simple to implement.
2) Datatypes. Because datatypes in XML Schemas only derive each
other by restriction, you can merely add an extra set of assertion for every
restriction. For example, if you have a string type X restricted to 15characters,
you can make an assertion to test that. If there is also a derived type X'
restricted to 10 characters, you can add an extra assertion for that too.
The difficulty here would be that converting perl regular expressions to
the simple string operations in XSLT expressions might be a little
challenging.
3) Structures. The lion's share of content models can be expressed
in any of the schema languages; the edge cases are imporant,
but may be difficult.
One interesting approach is to generate every possible path in the
document. Instead of slicing the structure up vertically, you are slicing
it horizontally.
So for HTML you might make rules for all these paths
/html
/html/head
/html/head/title[count(../title)=1]
/html/head/meta
/html/body[count(../body)=1]
/html/body/p
table/tr
table/tr/td
table/tr/th
ul/li
ol/li
dl/dt
dl/dd[preceding-sibling::dt]
...
and then a rule that reports any element that does not fit one of the rules,
or other patterns such as that a/../a is an error.
The interesting part, good for a bit of thought or a research project, would be
to figure out how to automatically derive the correct tests (in the brackets)
which give various kinds of position information. Also, the optimal length
of each path is interesting: presumably there would be efficiency
considerations for different policies.
It is perhaps a distinctive enough approach that it could have a schema
language all of its own ("chain" would be a good name), but I don't have
the energy.
Cheers
Rick Jelliffe
|