OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   PROPSAL: XSchema declarations and constraints

[ Lists Home | Date Index | Thread Index ]
  • From: Paul Prescod <papresco@technologist.com>
  • To: XML Dev <xml-dev@ic.ac.uk>
  • Date: Wed, 10 Jun 1998 10:50:25 -0400

John Cowan wrote:
> 
> I was attempting to adopt the distinction between "what IS so" and
> "what MUST BE so" (which DTDs conflate) and see where it led.

This is an excellent way of phrasing the distinction. But the clarity of
it leads to a troubling question, but also a big opportunity. Metadata is
an important part of what we want XSchema to support. But metadata is
inherently about "what IS so" versus "what MUST BE so". The big
opportunity is the idea going "click" in my head.

--

In the XML SIG, I have promoted the idea that "what IS so" should be
cleanly separated conceptually from "what MUST BE so" in the namespaces
specification. The "what IS so" document could be called a "vocabulary" or
"dictionary" or "namespace" or "directory." Vocabulary documents would
have textual descriptions of element types and attribute types,
knowledge-representation information, and so forth: metadata. It could
also contain or link to a "default" stylesheet or "default" schema. SGML
always had dictionaries, but they never existed in standardized formats.
TEI's dictionary was in the TEI DTD ("the TEI Guidelines"), DocBook's was
in DocBook, HTML's is in HTML ("the HTML 4.0 spec").

A schema is about "what MUST BE so". It defines a class of languages. Of
course schemas will (almost?) always be tied to vocabularies, just as
stylesheets will be tied to vocabularies. That doesn't mean that they are
the same thing, but merely that they are often related. It is undeniably
convenient to be able to specify metadata and schema information in the
same document.

The distinction is philisophical, technical and also practical. I've just
described the philisophical distinction. Now on to the technical and
practical ones:

#1. We don't want to duplicate document-type metadata in multiple schemas
for closely related schemas. HTML 1.0, 2.0, 3.2 and 4.0 all have more or
less the same semantics for the A element. It exists, it stands for
"anchor", it is a type of hypertext link, it marks both sides of the
links, etc. etc. etc. But they all have different content models and
attributes for A. If schemas are separate from metadata, then we don't
have to repeat the metadata in each document. Repetition gets us back to
the issue of redundancy/conflict. What happens if two versions of HTML
accidently describe a different semantic for the A element? That strikes
me as a problem.

#2. Elements can have multiple constraints applied to them. The
constraints are just each tested one after the other. But an element can
only have a single definition. That's why I originally suggested an
XSL-like syntax for describing constraints:

<RULE>
   <PATTERN><TARGET-ELEMENT TYPE="FOO"></PATTERN>
   <CONSTRAINT>...</CONSTRAINT>
</RULE>

<RULE>
   <PATTERN><BAR><TARGET-ELEMENT TYPE="FOO"></BAR></PATTERN>
   <CONSTRAINT>...</CONSTRAINT>
</RULE>


but now we have ended up with an "element-definition"-like syntax:

<ELEMENTYPE>
 ...
</ELEMENTTYPE>

The first is from a "this must be true and maybe there are other things
around that must be true" mentality. The second is from a "this and only
this is true" mentality.

I'm not proposing the first TODAY, but I am asking that we leave ourselves
open to full patterns and multiple constraint rules per element type, just
as there can be full patterns and multiple stylesheet rules per element
type in XSL. The only difference would be in XSL a single matching rule is
chosen. In our schema language ALL would be tested. When you combine this
with schema inclusion (in some future version), you also get the ability
to "tighten up" constraints from an imported DTD.

Note that the way I describe it above is inherently extensible in two
ways. Patterns can become more advanced. Constraints can also become more
advanced.

#3. It would be nice to have a clear way of distinguising metadata about
element types from metadata about constraints. For instance we might want
to describe an element's semantics separate from where we describe why it
has a particular content model in the schema, or allows or doesn't allow a
particular attribute.

---

Now I realize that we often want to put schema and vocabulary information
in the same document. Simple uses of XML should not be penalized for the
complexity of complex applications of it. In fact, simple uses of XML will
only have a single stylesheet: maybe those stylesheet rules should be able
to go "cheek to cheek" with schema and type declaration information.

What we need is to maintain the distinction between these different kinds
of information: type information, constraint information, style
information etc., but not require a separate file per information type. We
probably also want to allow the declarations to go right beside each
other, for maintenance simplicity. In other words, we do not want to
*conflate* the types of declarations, but we do want to allow them in the
same document.

I propose that the namespaces facility is the perfect solution to this
problem. We can break the XSchema specification into three parts (or
perhaps three sections of the same specification).

XDocTypeInfo -- a "bag" DTD that contains a list of declarations.
XSchema -- the definition for the XSC:Rule, XSC:ContentModel, etc. 
XVocab -- XVC:ElementType, XVC:Attribute declarations

Here's what I think that this would look like:

<XDTI:XDocTypeInfo>

<XVocab:ElementType Name="A">This is the anchor element.</XVocab>

<XSC:Rule>
  <TargetElement Name="A"/>
  <ContentModel><Mixed>...</ContentModel>
</XSC:Rule>

<XSC:Rule>
  <A><TargetAttribute Name="HREF"></A>
  <CDATA>
</XSC:Rule>

<XSL:Rule>
  <TargetElement Name="A"/>
  <sequence color="blue">
    <children/>
  </sequence>
</XSL:Rule>

<XLink:Rule>
  <TargetElement Name="A"/>

  <XLink:Role Role="Simple"/>
</XLink:Rule>

</XDTI:XDocTypeInfo>

A short form for a set of declarations all of the same element might be:

<XDTI:ElementInfo>

</XDTI:ElementInfo>

In this case, you could leave out the patterns in the various rules,
because the target would be implied by the context. We can't force the XSL
and XLink people to go along with this plan, but we can certainly use it
to differentiate our own schema information from our element type
information.

I won't have much time in the near future to defend this proposal, so I
hope that it stands on its own merits.

No, I still don't think that it necessarily means we want to go back into
entity provision territory. At least not text entities. There is still no
legal communication mechanism between the parser and the XSchema
processor. XML would have to be updated to allow the application to supply
entity replacement texts "automatically." And verifying that entities have
a particular value still doesn't seem very interesting.

 Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

Three things are most perilous: Connectors that corrode
Unproven algorithms, and self-modifying code
http://www.geezjan.org/humor/computers/threes.html

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS