OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Type and Structure Re: ASN.1 and XML



 From: "James Clark" <jjc@jclark.com>

> I do not see either TREX or RELAX as trying to interpose a type
> system between markup structure and the model (I assume you
> mean a conceptual/semantic model).

Nevertheless they do.

> In fact, this is one of the big differences in philosophy I see
> between TREX and RELAX on the one hand, and W3C XML
> Schema on the other.

Yes, I think that is a very important point.  Schematron can be added to the
list also with TREX and RELAX, in that it also comes from the view that
adding type information to the infoset to be used for general processing
adds a level of complexity (i.e. it requires schema processing rather than
simple keying from the element's generic identifiers) that is utterly the
wrong design for the web.

And it is no concession (as one of the few people in the world to have
written a book on DTDs!) for me to say that I like grammars, use them,
recommend them, think they are a smart and simple way to make assertions
about simple structures, etc.

But my point is different. Grammars are quite handy for some structures; but
they are dogs at others.  Grammars with nice set-operation properties can be
invented; but that won't make them any more usable than DTDs (though
certainly more powerful) or XML Schemas.   Having technology that can be
grasped and used by the non-elite is a more worthy and difficult goal than
science or elegance alone, to me (not that the latter is remotely unworthy
nor difficult, of course): the rest of the XML community seems on a
pro-grammar crusade that perpetuates the disconnect between concept and
expression.  One thinks about things in some terms, then tries to fit those
thoughts in a grammar.

For example, look at XML Schemas: it is entirely written based on the idea
of components: but there is no "component" element or type in sight.  It's
grammar is just concerned with expressing what it can: the markup gives
little indication of how things are grouped.

A better syntax for XML Schemas 2.0 might be along these lines:

 <component name="HTML Table" markup="html:table" >
      <set>
             <rcdata name="caption" markup="html:caption" />
             <component name="HTML body" >
                  <set name="HTML row" markup="html:tr" >
                      <mixed name="HTML data" markup="html:td" />
                  </set>
             </component>
      </set>
  </component>

Now of course this can be reduced to a grammar. (And of course one would
have to introduce occurrence indicators, if one wanted to use it for some
kinds of uses.)

But the point is that some non-terminal symbols in a grammar have
significance as more than just conveniences.  For example, lets look at
HTML's meta element: there are two flavours (one with http-equiv attribute,
one without.)   We could have a schema language which allows both:

 <component name="HTML meta">
    <choice>
      <bag  name="HTML meta with http-equiv"
            markup="html:meta[@http-equiv]" >
           <string name="HTTP equiv" markup="@http-equiv" />
           <string name="meta contents" markup="@contents" />
      </bag>
        <bag  name="HTML meta"
            markup="html:meta" >
           <string name="name" markup="@name" />
           <string name="meta contents" markup="@contents" />
      </bag>
    </choice>
 </component>

In doing so, we are modeling the concepts, but in a way that is entirely
amenable to conversion to validators (either to a grammar system or
something like Schematron).  Some conceptual structures have markup
representations; some do not--but the thing that determines them is not the
mechanics of constructing a grammar: they correspond to logical structures
that are not (and sometimes cannot be) marked up as the element tree.
Grammars should be an implementation technique, not the nub of the question.

The XML Schema WG made several decisions, based on demarcation with RDF
Schemas, that I think need not be followed by other schema-language
developers:  that bags and sets were not to be used, that links were not to
be followed, that we should not start with anything other than grammars.

I note the following too: if a schema language were made along the lines
component lines like the one above, it would be possible to define mappings
to schema languages. For example:

<component name="Table"
       markup:html="table" markup:cals="tbl" >
      <set>
             <rcdata name="caption"
                    markup:html="caption" markup:cals="title" />
             <component name="table body"
                     markup:cals="tbody"  >
                  <set name="table row"
                     markup:html="tr"  markup:cals="row"  >
                      <mixed name="data"
                            markup:html="td" markup:cals="cell" />
                  </set>
             </component>
      </set>
  </component>

So we model the information (i.e. we get much closer to some ER system) then
we use XPaths to describe how the information maps in
some XML representation.   Then whether this is implemented using
assertions or a grammar is not of interest to the schema user.  And (at
least for some significant structures) the problem of cross-mapping data
(same information, different structures) between schemas goes away.

I hope that makes it clearer the scope and nature to why I would lump RELAX
and TREX in with XML Schemas in regard to types.

Cheers
Rick Jelliffe