OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[xml-dev] RelaxNGCC : introducing a new way to process / transform XMLdocu ments ?



Hi,

I've just had a look at Daisuke Okajima's RelaxNGCC 0.3 [1]. I found this
tool inherently interesting, as well as the ideas it raises.

RelaxNGCC allows you to write an XML document processor by writing Java code
associated to various XML elements directly in the Relax NG schema
describing the source document. Have a look at tutorial 1 and 2 for a fast
understanding of this.

The first interesting idea is that this is a usage of schema that goes
further than pure document validation - an idea that I always enforced but
never had the time / the tools to put in practice. I'll come back on that
later.

The second interesting idea is that Mr Okajima (or is it Mr Daisuke ?) used
Sun's Multi Schema Validator [2], a library that can parse various schema
language into an abstract grammar.  Using such a library frees people from
having to handle the wide variety and complexity of current schema language,
thus focusing on algorithms rather than on the languages. By supporting
"RELAX NG, RELAX Namespace, RELAX Core, TREX, XML DTDs, and most of XML
Schema", the MSV may be a precious tool for programmers that want to build
tools on top of schemas without being buried deep into piles of obscure
specs.

The third interesting idea is the purpose of the project : you can use
decorated schemas to describe processings or transformations of XML
documents. The tutorial examples are quite demonstrative of this subject.

<digression>
What I find interesting here is that you could extend this idea, enabling
developers to write XML content within the schema. You get a transforming
language a bit like "reversed" XSLT : instead of writing unstructured
templates that have to match patterns within the source document, the
templates are implicitely defined by the schema.

What would be the advantage of such a language ?

1) the "stylesheet" would always be in sync with the source schema
2) the "stylesheet" would be more easily written, because the structure of
each element is clearly expressed in the schema, thus easily usable for
writing the output XML.
3) the "stylesheet" would be more easily read, because it would be naturally
structured like the source document, with "templates" clearly ordered in a
logical way.
4) context-sensitive processing would be a breeze to write and understand,
because the position of a "template" within the schema clearly defines its
context.
5) The source document would be validated in the same time it is processed.

There would be a big limitation, however : the ordering of the output nodes
clearly depends on the ordering of input nodes. For example, you can forget
<xsl:sort>. One could imagine special output nodes that ensure an ordering
on their child elements according to given criteria. But even the simple
task of switching two elements like in <a><b/><c/></a> => <a><c/><b/></a> is
not done easily. This could be solved, but anyway, even if it's not possible
to overcome this limitation, this processing model can be usefull.
</digression>

Regards,
Nicolas

[1] http://homepage2.nifty.com/okajima/relaxngcc/index_en.htm
[2] http://www.sun.com/software/xml/developers/multischema/