OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

A readable schema language ?


We are currently in bad need for a simple schema language, that would be
almost self-documented. Typically, I would appreciate a schema language that
could be printed on reference cards, and used by developers or content
writers. I know the best way to write an XML document while following a
schema is to get help from a schema-aware XML editor, but the current ones
are rare and do not behave like I expect. This language should be simple, so
that developers could write new ones without needing to read a 300-pages (or
even a 30-pages) documentation.

We have another need : an open schema implementation. By open, I do no mean
open-source. I mean that the schema structure is implemented in a way that
permit other usages than XML validation. A schema can be used to write
custom XML editors, perform on-the-fly validation (i.e. while the XML
document is generated, not after), to embed application specific
information, to optimize XPath expressions and so on. I am convinced that
there are a lot of possible usages that we haven't thought of yet, just
because we are focused on validation. The current implementations are
somewhat closed, because their API is totally oriented towards validation,
and their in-memory representation is validation-centric instead of

I did my best to learn W3C XML Schema and wrote a few schemas for some of
the format we use. The problem with W3C XML Schema is that it is awfully
awkward. We can't print a W3C XML Schema and expect content writers to
understand it. We can't expect a group of developer to write coherent
schemas. There are so many ways to do things in XML Schema (using
simpleTypes, complexTypes with/without simpleContent, extension,
substitution groups, etc.) that all we can expect is to get schemas that are
not readable, and not even usable for validation (because of bugs in the
schema or in the validator). We don't want to debug schemas if it is not
required. Last, I can't find an open W3C XML Schema implementation and I
can't implement it myself because I don't want to spend the rest of the year
doing so. I could quickly implement a subset of W3C XML Schema (following M.
Kawaguchi advices from his "W3C XML Schema made simple" article), but then
again I wouldn't have a truly readable schema, so why bother implementing it

I also studied Schematron, TREX, RELAX and RELAX NG, which are brilliant
piece of work. They all are much simpler than XML Schema, so it's easier to
write schema in these languages. There are even some XML structures that can
be specified with these languages that cannot be in W3C XML Schema. It seems
also easier to implement (though the ambiguity possibilities in RELAX and
RELAX NG mean of bit of algorithmic work). But then again, it forces the
schema writer to think in the way of the schema validator, not in the
"natural" way.

By "natural" way, I mean the kind of way that lead to Examplotron.
Currently, for the same schema, we have two versions : a "geeky" W3C XML
Schema document, used for validation, and a "natural", informal schema for
"normal" people, specified in a non-XML syntax similar to :

<!-- A result page -->
<result [title="page title"]>
    <section id="section id" [title="section title"]>
        <p [align="left|right|center"]>
          Sample text <i>in italic</i> in <b>bold</b>, in <i><b>italic

This is very bad (not XML), yet understandable. We thought a bit about our
problem, and we may have found a solution that suits our needs. It is still
in its preliminary stages, but I'd like to present it to you readers of this
list, since you may have some critics or advices to give us before we go
further. We have coined our language RESCALE, for ReadablE SChemA LanguagE.
The pronunciation similarities with RELAX are meaningful, since all we have
now is a RESCALE -> RELAX NG translator (through a XSLT stylesheet).

The attached file pml.rescale.xml <<pml.sample.xml>>  contains a sample (and
incomplete) schema for one of our XML applications, PML. As you can see,
it's XML, and straightforward to read :

<result title="optional" xmlns:schema="http://www.ubicco.com/ns/rescale"
  <section id="required" title="optional">
    <p align="optional,default=left" mode="optional,default=wrap">
      <schema:zeroOrMore schema:id="paragraphContent">
          <b><schema:ref name="paragraphContent"/></b>
          <!-- etc. -->

Schema writing is pretty simple : any element not in the
http://www.ubicco.com/ns/rescale namespace is considered as a template.
Elements from the http://www.ubicco.com/ns/rescale namespace usually have
the same meaning as in RELAX NG. Any element (whether it's in the RESCALE ns
or not) can have a schema:id, and can be referenced by using the schema:ref

The attribute syntax is non-XML. This can be a problem for implementations,
but it is great for readability. It is composed of a number of
comma-separated terms. The first one can be "required", "optional" or
"fixed=<value>", and tells whether the attribute is required, optional or
must have a given value. The second one is the default value for the
attribute (not mapped to RELAX NG since it does not supports default
attributes). We plan on adding a king of datatype specification for
attributes, but for the moment it is not supported.

We believe our content editors and developers will be able to easily read
and write such schemas. I'd like to have your comments on this point.

The attached file rescale2relax.xslt transform this RESCALE (wannabe-)schema
into a RELAX NG schema :  <<rescale2relaxng.xslt>> 

I tested this with Michael Kay's SAXON 6.2.2 for XSLT and James Clark's JING
for RELAX NG parsing/validation.

Nicolas Lehuen
Responsable R&D - Head of R&D
Ubicco - Multi Access Software Solutions