xml-dev - Random thoughts on reflection

Random thoughts on reflection

[ Lists Home | Date Index | Thread Index ]

To: xml-dev <xml-dev@lists.xml.org>
Subject: Random thoughts on reflection
From: Eric van der Vlist <vdv@dyomedea.com>
Date: 09 Aug 2003 20:53:16 +0200
Organization: Dyomedea (http://dyomedea.com)

Hi,

Just some random thoughts on my way back from Extreme about Henry
Thompson and Ari Krupnikov presentation on the XML Infoset being a
reflexion of XML documents.

I am offline at the Montreal airport waiting for my flight and can't
check the details, but here is very briefly what I have understood and
remember about this presentation:


      * The XML Infoset can be seen as a reflexion (i.e. a description
        which can be expressed in XML) of an XML document.
        
      * In XSLT, it is easy (and elegant IMO) to expose this reflexion
        as an extension function.
        
      * This reflexion can be extended with annotations, for instance to
        represent the PSVI.

If we take a document such as:

<foo bar="valbar">
 <baz>valbaz</baz>
</foo>

The syntax proposed for the XML infoset exposed as a reflexion of the
document is something such as:

<element namespaceName="" localName="foo">
 <attributes>
  <attribute namespaceName="" localName="bar" normalizedValue="valbar"/>
 </attributes>
 <children>
  <element namespaceName="" localName="baz">
   <attributes/>
    <children>
     <textNode value="valbaz"/>
    </children>
  </element>
 </children>
</element>

For far, so good! I really like the idea and would like to propose
another angle --or another reflexion-- on this concept.

We use to think as schemas being the definition of a set of instance
documents which are so-called "valid" per the schema. In other words, a
schema (in any schema language) is the definition of a set of instance
document and I think that it's fair to say that a set of instance
documents can be considered as a schema.

If we accept this fact, we can say that the infoset shown above is a
schema defining a set of instance document which is the singleton
composed of our instance document only.

If we consider the set of all the schemas in all the possible schema
languages per which a given instance document is valid, ie the set of
all sets of instance documents containing our instance document it
becomes clear that the infoset is the interception of all these schemas.
In WXS terms, we can say that the infoset is a derivation by restriction
of each of these schemas.

The next question if we accept to consider the infoset as a schema is to
see if existing schema languages couldn't be used to serialize the
infoset (otherwise, why should we want to reinvent the wheel?).

If we try to do the exercise with Relax NG, we get:

<element name="foo" xmlns="http://relaxng.org/ns/structure/1.0"; 
  datatypeLibrary="" ns="">
  <attribute name="bar" ns="">
   <value type="string">valbar</value>
  </attribute>
  <element name="baz" ns="">
   <value type="string">valbaz</value>
  </element>
</element>


Which I find amazingly similar to the reflexion syntax proposed by HT &
AK.

This buys us a non XML syntax (for the same price):

element foo {
 attribute bar {string "valbar"},
 element baz {string "valbaz"}
}

Of course, this could also be done with W3C XML Schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";   
  elementFormDefault="qualified">
 <xs:element name="foo">
  <xs:complexType>
   <xs:sequence>
    <xs:element name="baz">
     <xs:simpleType>
      <xs:restriction base="xs:string">
       <xs:enumeration value="valbaz"/>
      </xs:restriction>
     </xs:simpleType>
    </xs:element>
  </xs:sequence>
  <xs:attribute name="bar" use="required">
   <xs:simpleType>
    <xs:restriction base="xs:string">
     <xs:enumeration value="valbar"/>
    </xs:restriction>
   </xs:simpleType>
   </xs:attribute>
  </xs:complexType>
 </xs:element>
</xs:schema>


But I'd argue that Relax NG is much closer to the original proposal for
the reflexion serialization and WXS would cause other issues such as the
fact that different namespaces can't be mixed in a single schema
document.

Now, of course, this simple document isn't representative of the
complexity of typical XML documents. Amongst the challenges coming to
mind, I'd like to mention mixed contents and historical curiosities such
as XML comments and PIs.

If we consider a document such as:

<foo>
 before
 <bar/>
 after
</foo>

Or even if we want to keep whitespaces in:

<foo>
 <bar/>
</foo>

We are running into troubles with WXS and RNG since it is forbidden to
write things such as:

element foo {
 string "\x{A} before \x{A}",
 element bar {empty},
 string "\x{A} after\x{A}"
}

Those of you familiar with my posting (and book in progress) will have
recognised one of my favorite permathreads, but that's not my point here
and I'll keep it away. 

Another issue raises if we want to support information items such as
comments and PIs which are discarded from the data model supported by
Relax NG.

The most elegant way to solve these issues would be to extend the
features of Relax NG, but for a short term solutions, it should also be
possible to add annotations.

Finally, the link between all this and Examplotron is also interesting
IMO. We've seen that the infoset is a schema which validates a single
document. The infoset being a reflexion of the instance document is
equivalent to this document and the instance document itself is its
schema. In this new angle, Examplotron is a proposal to extend this
schema to validate a class of documents which can be considered as
"similar" to our instance document. It is thus the definition of an
annotation controlled generalisation (I'd say extension if WXS wasn't
using this term with a different meaning) mechanism.

I think we've closed the loop between instances and schemas (probably
more than once already) and I'll spare you further reflections... and
the vertigo they would provoke!

If you really want to go on, I have always thought that XPath paths and
Relax NG patterns can be considered as reflections of each other (and
not only because James Clark is involved in both)... If that's the case,
the infoset can be expressed as XPath expressions, but do you really
want me to elaborate?

Eric
-- 
Read me on XMLhack.
                                      http://xmlhack.com/author.php?id=8
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------

Follow-Ups:
- re: [xml-dev] Random thoughts on reflection
  - From: David Megginson <david@megginson.com>

Prev by Date: Re: [xml-dev] tool to generate one standalone DTD from multiplemodule DTD
Next by Date: re: [xml-dev] Random thoughts on reflection
Previous by thread: Re: [xml-dev] tool to generate one standalone DTD from multiplemodule DTD
Next by thread: re: [xml-dev] Random thoughts on reflection
Index(es):
- Date
- Thread