[
Lists Home |
Date Index |
Thread Index
]
Thanks,
I have so far had three suggestions which I could how to
implement - ideally they have to be based on XML syntax as that means
the amount of new code is minimised (I do not wish to write complex
interpreters in a portable environment).
(A) little languages
At 10:13 20/06/2006, Rick Jelliffe wrote:
>In some of my company's products we use our own little schema
>language that says
>
>* what elements are allowed or required
>* what attributes are allowed or required
>* what elements are only every found in first or last position
This is my preferred solution, but only if there is a critical mass
of other XML developers who have the same view.
>We also have "usage schemas" which sample documents and generate all
>the possible grandparent/parent/child paths in the document, and
>checks other documents against these.
>
>Checking lists of tokens is indeed a very problematic area for
>Schematron using the default XSLT 1 implementations.
Agreed. This is one reason for special languages. A related area is
checking dataTypes. For example we might wish to check that a point
in a graphics language contained two positive integers, such as
<point2>12 34</point2>. I don't think Schematron has any special
support for asserting that something is a positive integer. So it
could make sense to have a function like:
<assert test="dataType(point2, 2, xsd:positiveInteger)"/>
which checks both the length of the list and the dataType.
This will not work with custom simpleTypes (unless there is access to
the schema and tools to process it). So we may need to have tools to
define custom types by extending xsd builtin types.
It also doesn't allow us to do arithmetic - we might wish to assert
that the length sqrt(x^2+y^2) is within given limits. It doesn't seem
to me that this is an unrealistically complicated type of validation test.
>ISO DSDL was created to give a home and official status to these
>kind of little languages. If anyone can come up with a technically
>excellent and implemented little schema language that helps validate
>some significant kinds of markup idioms that XSD or the other ISO
>DSDL schema languages do not cover well (as is *entirely* possible),
>I am certain the ISO SC34 WG1 group would be interested in
>considering it for standardization, in typically unpanicked fashion.
If there are others interested then I would be interested in
suggesting use-cases for a little language that checked simpleTypes.
It should be fairly acceptable to add XSD facets to the language, perhaps like:
minInclusive($list, value) // do all values correspond to the
minInclusive criterion
minInclusive(length($list), value) // does the length of the list
correspond to the minInclusive criterion
unique($list) // components of list are all distinct
hasId($value, XPathContext) // does the $value correspond to the id
of an element describable by the context (I'm sure there are better
suggestions here)
...
and I would like to be able to do STM maths (e.g. Math.* in Java).
I am not sure how much of this is covered by XSLT2
(B) Schematron
>To be honest, I suspect that Schematron with a particular extension
>could pretty much do what Peter requires. In particular, ISO
>Schematron has a macro facility called abstract patterns that allow
>you to be much more declarative in labelling the participants in a
>schema relationship: you could have one like
>
><sch:pattern name="required-child" abstract="true">
> <sch:rule context="$parent">
> <sch:assert test="$child">The parent should have a child</sch:assert>
> </sch:rule>
></sch:pattern>
>
>where the $ tokens are macro arguments that are replaced by their
>invocation to give conventional Schematron schemas
>
><sch:pattern name="eg" is-a="required-child">
> <sch:param name="parent" value="Angela"/>
> <sch:param name="child" value="Suhai"/>
> <sch:param name="position" value="1" />
></sch:pattern>
>
>What this gives is enough markup that a custom processor can take
>the schema and
>generate code based on it. For example, to append a Suhai element
>to the Angela
>element in the first position. In fact, you might even decide not to
>ever validate using the Schematron schema per se, (use it as
>documentation) but to drive your superduper custom processor with
>the information specified using abstract patterns!
>
>Abstract patterns represent, I hope, a significant advance in
>home-made schema languages, because not only do you get the
>background boring power of XPath validation, but you also get the
>extra labelling required to enable identification of the parts of
>constraints and assertion
>tests. And that identification opens the door for re-targetting the
>schema for purposes such as code generation or any kind of useful
>purpose. XPaths are great because they are terse; abstract patterns
>overcome the concomitant lack of declative expressiveness.
I have read the spec - thanks - and this may well be able to manage
much of the content validation that I currently require. It may be
that it is complementary to the dataTyping in (A)
(C) XQuery
Why not XQuery, combined with MUST / MAY / MUSTNOT conditions? XQuery
is a declarative language that can express the conditions given
below. And I'd expect it would be fairly easy to define the
user-declared functions you need.
Jonathan Robie
I have not used XQuery very much but it looks sufficiently complex to
parse that it would be difficult to extract the declarative logic
from it without having an XQuery processor inbuilt and called at each
stage. But I would be happy to see more detail.
Implementation.
===========
In general XSD schema, Schematron and other approaches seem aimed
primarily at validating static or static-like instances of complete
documents. While this is important to me, there are at least two
other requirements:
(a) generating code. For example I have an element scalar that can
have either a "value" attribute and element-only content or PCDATA
content of the same value (this may not be the happiest design, but
that it how it is. (I am increasingly finding that I need to add
children to elements that were designed for text-only content).
Example:
<scalar dictRef="a:height">123.4</scalar>
<scalar dictRef="a:height" value="123.4"><metadata name="dc:date"
value="2006-06-23"></scalar>
Currently my autogenerator will create:
String Scalar.getXMLContent(); // reserved name for accessing PCDATA
String Scalar.getValue(); //
If we allow something like:
<assert test="
@value and normalize-space(.)='' or
(not(@value) and count(*)=0 and not(normalize-space(.)='' )"/>
(my XSLT is rusty, but that is meant to say that exactly one of
@value and non-empty PCDATA is allowed) then the code logic would be
something like this (I use a XOM binding):
String Scalar.getValue() {
String value = super.getValue(); // there is a superclass that
provides a simple getter
String x = super.getXMLContent();
Assert.assertTrue("cannot have value and text content", value != null
&& (x == null || x.trim().equals(""));
Assert.assertTrue("Cannot have text and children",
value == null && (this.getChildElements().size()==0 &&
!x.trim().equals(""));
}
This will automatically capture the data in the required order and
should be autogeneratable from the declarative language
(b) validation during parsing.
I am increasingly using this approach to validate as a document is
parsed. Where possible XML tools are used but obviously some of this
has to be bespoke (although it will be autogenerated). This means
there is no need for heavyweight tools such as Xerces and that I only
need as much apparatus to validate the input as is defined in the schema.
(c) validation of complete documents.
Ideally this should be possible using Schematron and other commodity
approaches without the custom code. But it requires extensions to the
current toolkit.
============
In summary, therefore, I would be interested in:
- a communal little language for validating dataTypes
- exploration of the range of concepts that are not supported in
current schemas ideally to find a consensus of the cost and benefits
of extensions.
- any other experience and comments.
Many thanks
P.
Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road, Cambridge CB2 1EW, UK
+44-1223-763069
|