xml-dev - Re: [xml-dev] DTDs, W3C Schemas, RELAX NG, Schematron?

Re: [xml-dev] DTDs, W3C Schemas, RELAX NG, Schematron?

[ Lists Home | Date Index | Thread Index ]

To: "Tim Bray" <tbray@textuality.com>
Subject: Re: [xml-dev] DTDs, W3C Schemas, RELAX NG, Schematron?
From: "James Clark" <jjc@jclark.com>
Date: Fri, 24 May 2002 14:41:52 +0700
Cc: <xml-dev@lists.xml.org>
References: <200205231824.OAA13725@mail.reutershealth.com> <3CED39E9.3060407@textuality.com>

> John Cowan wrote:
>
> > 9) RNG is closed under union and intersection: given two document
classes
> > described by RNG schemas, one can mechanically construct an RNG schema
> > which describes documents appearing in both classes, and another which
> > describes documents appearing in either class.
>
> Did we lose set difference?  The thing that knocked me out years ago
> when Murata started talking about set operations on hedge automata was
> the result of Schema A - Schema B.  Here's the application: you rev a
> schema, then take the difference of version N - version N+1, and get a
> schema for the class of documents you've just broken.  -Tim

Difference is also important for static type-checking.  If you want to
statically check that the output of some program with some input schema is
guaranteed to be valid with respect to some output schema, then firstly you
need to compute a schema that represents the result of the program when
applied to a document with the input schema and then you need to compute
whether that schema is a subset of the output schema. (You can also do it
the other way round by computing the schema for the class of documents that
when the program is applied to them produce documents conforming to the
output schema and then check that this schema is a subset of the input
schema.)  Checking that S1 is a subset of S2 is the same as checking that
the difference of S1 and S2 is empty.

The question as regards intersection and difference is rather more complex
than the question as regards union.  With union, it's dead simple: if you
have two RELAX NG schemas, s1.rng and s2.rng, then

<choice>
  <externalRef href="s1.rng"/>
  <externalRef href="s2.rng"/>
</choice>

is a schema that matches the union.

However, RELAX NG doesn't have general intersection and difference
operators. With intersection and difference, there's a theoretical level at
which the issue is simple.  If you consider only elements (i.e. ignore
attributes and text), then the class of language that RELAX NG schemas (like
RELAX Core) can represent is regular tree languages.  And it is a well-known
theoretical result that regular tree languages are closed under intersection
and difference.

However, there is a big gap between theory and a practical application:

- how big is the schema representing the intersection/difference?  It might
be too big to be practical.

- how long does it take to compute the schema for the
intersection/difference?

- what about attributes?

- what about typed data?

I don't know the answers to these questions.

The good news is that there's a very smart researcher, Haruo Hosoya, who
created XDuce for his PhD, a large part of which is dealing with computing
subset relationships for a type system which is very similar to RELAX NG.
His last release of XDuce (http://xduce.sourceforge.net) adds support for
attributes using the same approach as RELAX NG.

My best guess of the situation is:

- it should be possible is to create a practical application that computes
the intersection or difference of two RELAX NG schemas; XDuce is pretty
close to doing just this (however, it's written in O'Caml, which is a very
powerful, but rather non-mainstream language);

- it is not trivial to write such an application;

- there may well be some exceptional cases, where the application couldn't
do it at all, or where the result would be impractically large, or where it
would take impractically long to compute the result.

Hosoya-san or Murata-san would be able to give a more informed answer to
this.

James

References:
- Re: [xml-dev] DTDs, W3C Schemas, RELAX NG, Schematron?
  - From: John Cowan <jcowan@reutershealth.com>
- Re: [xml-dev] DTDs, W3C Schemas, RELAX NG, Schematron?
  - From: Tim Bray <tbray@textuality.com>

Prev by Date: Re: [xml-dev] DTDs, W3C Schemas, RELAX NG, Schematron?
Next by Date: ANN: XML-DBMS version 2.0, alpha 3
Previous by thread: Re: [xml-dev] DTDs, W3C Schemas, RELAX NG, Schematron?
Next by thread: Re: [xml-dev] DTDs, W3C Schemas, RELAX NG, Schematron?
Index(es):
- Date
- Thread