[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Suggested guidelines for using local types. (was Re: Enlightenment via avoiding the T-word)
- From: "Fuchs, Matthew" <matthew.fuchs@commerceone.com>
- To: 'James Clark' <jjc@jclark.com>, xml-dev@lists.xml.org
- Date: Tue, 11 Sep 2001 10:53:35 -0700
James,
A long response, slow in coming. Not all the pieces are fully fleshed out,
but I think I'm getting to the essence of the issue, at least from my
perspective. I agree there seems to be much confusion about what the
questions are. However your formulation still doesn't give the questions
I've been addressing, which is, of course, my lead in to reformulating them.
Hopefully that will also answer why I'm particularly concerned with XSDL,
even though I don't view it as the center of the Universe. If it turns out
that what I thought was an XSDL-specific problem also applies to RELAX NG,
then it would be nice to resolve it similarly.
I see three questions:
> (a) If the meaning/allowed content of an element is highly
> context-dependent, should the name of the element be
> namespace qualified or
> not?
(b) If the answer to (a) is yes, then what should the namespace be? (There
is a subordinate question here, which you don't answer, of what a namespace
is.)
> (c) If an element is declared by an XSL local type, should the name of the
> element be namespace qualified or not _under current circumstances_?
To foreshadow a little bit, I think the answer to (a) is unequivocally yes,
which may surprise readers of this thread. However, the subtle question is
(b). If the answer to (b) is not "the schema namespace" (in XSDL terms)
then, _under current circumstances_ and following the Engineer's Hippocratic
Oath (if you have to make a decision, make the decision which is easiest to
back out/change later on - don't look for it, I just made it up), the best
thing to do is to leave them unqualified because:
1) it distinguishes them from other elements which can currently be
appropriately namespaced. This is important information for applications.
2) once the appropriate mechanisms are introduced to put them in the
appropriate namespace - whatever that is determined to be -, current
instances can be updated by adding information and those parts of
applications which dealt with them isolated and updated. If they've been
put into the schema namespace, then existing information must be changed -
instances rewritten and schema elementForm values changed.
So my basic position, which seems to get lost in the heretical statement
that there could be a reason - however tactical - not to put a name in a
namespace (they catch colds so easily, you know), is that XSDL support for
namespacing local types is incomplete at best. Ultimately, this will need
to be resolved (I hope). In the meantime, use that choice (unqualified)
which is most likely to cause the least pain in moving to the eventual
solution. (Note that even if the final decision is to put locals in the
schema namespace, schemas using the defaults can be converted by adding an
"elementFormDefault='true'" attribute and instances by adding default
namespace attributes to elements. No existing information is touched. Note
that for any final decision other than putting all locals in the schema
namespace, any existing instance must be touched.)
So to me, the crux is the answer to question (b), what should the namespace
be? As you might imagine, I don't think it should be the schema namespace.
I'll argue this from two directions. I'll start by textual exegesis of the
normative portions of the NS rec and associated rfc's. I know this is a
dangerous activity (been slapped for interpreting the (w)rec(k) before, but
what the hell). Then I'll appeal to referential integrity and referential
opacity, concepts developed by Gotlob Frege, the father of modern logic
(there's nothing really complicated here, though).
Turning to the normative part of the NS rec, we see the following statement:
"The combination of the universally managed URI namespace and the document's
own namespace produces identifiers that are universally unique". Naturally,
I next looked for a definition of identifier. I couldn't find one. But I
did find a normative reference to rfc2396, "Uniform Resource Identifiers
(URI): Generic Syntax". In there I found the following definition:
"Identifier: An identifier is an object that can act as a reference to
something that has identity. In the case of URI, the object is a sequence
of characters with a restricted syntax."
If you put this all together, it means something like "the combination of
the universally managed URI namespace and the document's own namespace
produces a universally unique object that can act as a reference to
something that has identity." Tim Bray coined uname for this. Let's use it
as the universally unique object for talking about the kind of object formed
from the above described combination.
The phrase "reference to something that has identity" is not further
defined, so now is the time to make reference to the notions of "referential
transparency" and "referential opacity".
Referential transparency is similar to context-free - something is
referentially transparent if it refers to the same "thing" regardless of
context. Functional programming languages are generally considered
functionally transparent if an expression that equates to a value will
always equate to that value - therefore the value can be substituted for the
expression. (This is great for optimization.) When dealing with a name, it
means that the name means the same thing regardless of where it shows up.
It always names (or labels) the same thing. Thousands of years of
diplomacy, a hundred years of logic, and 50 years of programming show that
referential transparency is a Good Thing. Slightly restating a definition
found in [1], it is "the fundamental property of mathematical functions
which enables us to plug together black boxes.... There are a number of
intuitive reading of the term, but essentially it means that each [uname]
denotes a single [type] which cannot be changed ... by allowing different
parts of a [schema] to share the [uname]."
There's now a very interesting split. It revolves around the two questions:
1) what is a namespace (and does it have any intrinsic 'meaning' at all)?
2) is a uname "a reference to something that has identity"?
I would argue that Relax NG (following Makoto's work and XDuce, which is
really a restatement of Makoto's work from a different angle) and XSDL take
opposing answers to these questions, and those opposing answers explain a
lot.
Relax NG, following Makoto's strong grammatical view, considers a namespace
meaningless from the perspective of the schema - applications may impute
meaning later, but from the point of view of Relax, a name is simply a pair
of strings, one of which happens to look like a URI, and one of which
happens to follow the Name production from XML1.0. Likewise, the uname
refers to nothing beyond itself. Trivially, one can enumerate all unames,
and each would be different. The uniqueness business is a trivial property
of the enumerability. Unames are just tokens to be manipulated by the
grammar. The common notion of "element type" is meaningless in Relax NG,
although one could write schemas and applications that behaved as if such a
thing existed. The types of Relax NG are the <define> elements of the core.
This has interesting implications for applications, although I'm intensely
jealous of Daisuki Okajima and RelaxNGCC. In particular, type information
_never_ shows up in an instance - just as looking at an arbitrary character
in a string doesn't tell you which non-terminal that occurrence is defined
by.
XSDL, on the other hand, considers namespaces, or at least those that are
the value of the targetNamespace element, to have a very particular meaning.
A namespace names a schema, and a schema creates a set of identifiers and
structure definitions. If you don't include local names, then given a valid
document, unames in the instance are referentially transparent identifiers.
The element uname refers to the unique declaration labeled by the schema
with that name. DTDs (at least after pe's have been expanded out) have the
same property. (Most of the discussion of context is more about what you do
with an element once you have it, than what the element is.) One can
quibble with the direct statement of this (Tim obviously does), but then one
can bring in the mathematical guns - as long as there's a one-to-one mapping
from element names to element types, they're identifiers. Referential
transparency has demonstrated its utility for longer than SGML's been around
- it should take a good reason to lose any of it, beyond the aesthetics of
document appearance or some notion of "best practices" established in an
environment in which either referential transparency was assured (DTDs) or
unenforceable (Relax NG, well-formedness).
One good use of the XSDL approach is making contracts. Suppose two parties
want to come to a common agreement about the form of a PurchaseOrder. I
think it is very convenient to be able to directly refer to that agreement
as the PurchaseOrder type - and when a PurchaseOrder appears in an instance,
the appropriate definition can be retrieved. In the Relax approach, one
cannot directly reference the definition from the document - the decisions
about the structure of PurchaseOrder are indirectly implemented in <define>
elements (as I read the spec). Nor is there any direct way to relate the
different "effective" content models of the PurchaseOrder element. This is
why I generally see RelaxNG and Schematron as functioning best as local
schema language and the XSDL approach (sic!) as better for "public" schema
languages. The best way to be clear about your type system is to wear it on
your sleave. That's why I like naming, extension, etc., to be algorithmic.
It has always been my intention to work on type-inference based schema
languages after finishing XSDL, but you know how that went....
Which brings up another weakness of the current XSDL, which cannot
appropriately namespace its own elements - there is no normal form of a
document in which all the pertinent information is available without
validating because certain constructs lack unames - some even lack names.
Type information in the case of anonymous types, and correct namespace
information in the case of local elements, is information that must be added
by the PSVI. Were there to be such a canonical form (in other words, were
everything nameable in a referentially transparent way), then it would be
possible to safely manipulate schema documents with technology for
well-formed one XML. All the pertinent (i.e., schema related)
metainformation would be directly available in the instance (or inserted by
a single pass through a validator) and there'd be no need for most
applications to refer to the PSVI.
No one should read any of this as an attempt to "dis" RelaxNG, for which I
have a great deal of respect (if not enough free time to truly grok). I
don't see any issue with a synthesis in the future, particularly if you've
left Makoto's beloved closure properties tractable.
Matthew
> -----Original Message-----
> From: James Clark [mailto:jjc@jclark.com]
> Sent: Wednesday, September 05, 2001 7:45 PM
> To: Fuchs, Matthew; 'Jonathan Borden'; xml-dev@lists.xml.org
> Subject: RE: Suggested guidelines for using local types. (was Re:
> Enlighte nment via avoiding the T-word)
>
>
> Not everybody seems to be answering the same question here. We can
> distinguish the questions:
>
> (a) If the meaning/allowed content of an element is highly
> context-dependent, should the name of the element be
> namespace qualified or
> not?
>
> (b) If an element is declared by an XSL local type, should
> the name of the
> element be namespace qualified or not?
>
> From my perspective, question (a) is the primary question,
> and although XSD
> may be relevant, it's not an XSD-specific question. It's a
> namespaces
> question. It arises equally if you are using RELAX NG to define your
> vocabulary. People who view XSD as the center of the XML
> universe may view
> (b) as the primary question.
>
> My answer to (a) would be that it should be namespace
> qualified. Here's
> why. I don't see a sharp, binary distinction between
> context-dependent and
> context-independent elements; rather I see a continuum of
> different degrees
> and kinds of context-dependence. For example,
>
> 1. At the most context-independent end of the spectrum, we
> have an element
> like <html> which occurs only as the root element.
>
> 2. Another step down, would be something like <h1> which
> cannot occur as a
> root, but has consistent content model and processing
> wherever it occurs.
>
> 3. Another step down, would be something like a <title>
> element that can
> appear as the child of a <chapter>, <section> or
> <subsection>. It has the
> same content model, but the processing may partly depend on
> the context.
>
> 4. Another case would be an element subject to SGML exceptions. Say a
> <para> make occur inside or outside a <footnote>, but inside
> a <footnote> a
> <para> may not contain a <footnote>. In a DTD, you would not
> be able to
> express the distinction. In RELAX NG, you would use a
> separate pattern for
> the content of a <footnote> in a <para>
>
> 5. Further towards the context-dependent part of the
> spectrum, would be
> something like <param> in HTML; it is allowed by both
> <object> and <applet>
> with a consistent semantic, but it doesn't make any
> interpretation outside
> its containing <object> or <applet>.
>
> 6. I guess the most context-dependent would be something like
> thead/tbody/tfoot which occur only in a table.
>
> I don't see any point on this continuum where it makes sense
> to draw a line
> and say: above this line namespace-qualify, below this line don't
> namespace-qualify.
>
> I would suggest instead that the question of whether to
> namespace qualify
> should be based on the answer to the question: what is the
> namespace that
> defines the meaning of this element? If there is such a
> namespace, then the
> name of the element should be qualified with that namespace.
> If there is no
> such namespace, then then name of the element should not be
> namespace-qualified.
>
> As for attributes, I would say that the attribute should be namespace
> qualified if (and arguably only if) the meaning of the
> attribute is not
> determined by the namespace of the parent element. This
> implies that the
> name of the attribute that extends the attributes of a
> namespace-qualified
> element should be namespace qualified. This seems a natural
> guideline to
> me. (I think it corresponds to what ##other does in XML Schema for
> anyAttribute.)
>
> One objection to this is that it is not uniform between elements and
> attributes. My response would be that this non-uniformity is
> appropriate
> given that this is primarily a namespaces issue, and given that the
> Namespaces Rec does not default namespaces uniformly for elements and
> attributes.
>
> James
>