[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Enlightenment via avoiding the T-word

From: "Fuchs, Matthew" <matthew.fuchs@commerceone.com>
To: 'Tim Bray' <tbray@textuality.com>, xml-dev@lists.xml.org
Date: Mon, 27 Aug 2001 15:34:26 -0700
Tim,

You had me with you right down till:

> There is an open debate as to whether or not, in newly
> defined vocabularies, ulabels should be specified for 
> all the elements.  Those who feel that they should be are
> irritated by XSDL's apparent bias in the other direction.

I don't think that's the debate at all.  I certainly think all elements
should have a ulabel in the long term.  I would rephrase this as:

There is an open debate as to how ulabels should be assigned to elements in
vocabularies defined by XSDL, and if the means of assignment are
insufficient, how to improve the assignement of ulabels, and what to do in
the interim.  Those who believe the current mechanism is insufficient (and
misleading) are irritated that those who feel ulabels should be used
everywhere appear willing to limit debate to the inapproriate choices
provided by the current Rec rather than discuss the best long term solution
and the best forward-compatible interim behavior.

On the issue of context, let me provide a slightly different view of
processing - my alternate reality - without using the T word.

At the bottom there is {0,1}* -- finite strings of 0's and one.  At this
level there is no context required to determine what's a 0 and what's a 1.

Unicode maps from {0,1}* to characters.  To determine the significance of
any particular 0 or 1 requires context for that 0 or 1 within the character
represented by itself and its surrounding bits.  However, once the map is
done, the result is a string of characters, and no context is required to
understand which character a particular character is - an "A" is an "A"
regardless of what comes in front or behind it.

Well-formed XML1.0 maps from strings of Unicode characters to documents
(basic tree model view).  To determine the significance of any character in
the production of the document requires context (is it part of a name, the
start of a tag, etc.)  However, once that's done, the result is an infoset
of a document, and there's no need of context to determine what's an
element, what's data, what's an attribute, etc.  It's also clear what the
labels of these things are.

DTDs provide a mapping from well formed documents into, essentially, a term
algebra for a particular set of structures.  I.e., once you've validated a
document, any element structure (but, alas, not attributes - but at least
there's only one level of nesting) with a particular label is of the set of
structures "labeled" by an element definition with the same label.  Of
course, as you've pointed out, this is only one piece of the semantics, the
"label->meaning" map, but it's the piece validition provides.  Now,
validating an element requires access to its context in the document, but
once it's been validated, you don't need access to the context to establish
which definition applies to it - that much semantics is uniquely identified
by the label and the DTD.  Where the context in which a particular element
of a particular definition appears is important is then at the next level of
application semantics - but that level at least knows which element is
appearing there - to the extent to which the DTD can specify that.  There
may in fact be multiple level above this, with the same iterative behavior.
If you don't use validation at all, then you have some other means of
determining what a label signifies.

The desire to "mix" vocabularies screwed up the nice label->definition
mapping above, or even a "label->meaning" mapping without DTD.  We could've
said "use context", but we didn't because people (including you, Tim)
thought _context wasn't sufficient_.  Namespaces provide a mapping from
straight 1.0 documents to ulabeled documents.  In order to figure out the
ulabel of an element or attribute, you need access to the context of the
node.  However, when you're done, every node has a ulabel, and you don't
need access to a node's content to determine its ulabel.  It wouldn't have
been hard to create a version of DTDs that would support Namespaces, but we
went right to schemas, so it never happened (as I write this, old memories
arise, but I'll just slap them down).  If we had, then this paragraph would
have gone between the last two.

XSDL, however, is supposed to be a validation mechanism that supports
Namespaces, as DTDs don't.  However, just as DTDs provide a mapping between
labels and definitions, one would hope that XSDL would provide a bijection
between ulabels and definitions - at least for elements.  Of course, this
wouldn't exhaust the semantics that could be applied to structures, but at
least within the context of working with XSDL, and for the benefit of people
trying to use XSDL for work, it is highly desireable.  In other words, once
a document has been validated, every element should have a ulabel, and from
the ulabel alone you should be able establish which definition applies to
that element within the context of XSDL validation.  This doesn't mean that
further processing doesn't depend on the context.  It doesn't even mean that
XSDL is the final word in which constraints establish that mapping, one can
always postprocess.  It just means that if people go through the (currently
not insignificant) effort to use XSDL, that's one of important pieces of
information XSDL should give them.  And, ta da!, it doesn't do that.  There
is no injective mapping from ulabels to definitions.  A ulabel maps to a set
of definitions, and if you want to know which is the "true" definition, you
need to either reparse the surrounding element(s) or hope there's a PSVI
available (not insignificant overhead for many processors) so you can do
pointer comparisons, or whatnot.  However, if local elements (as described
in XSDL) are not given ulabels (are unqualified), then you still get an
injective mapping, and the hope of fixing things later (by retrofitting in
the least disruptive way).

What I've argued here is that each level of processing should resolve its
own context sensitive issues, rather than leak them to the levels above.
Just as it is better if applications don't need to know about the actual
prefixes and locations of namespace attributes in the source documents, it
is better if applications don't need to decipher uses of elementForm in a
schema to know which element is which.

An appropriate ulabeling mechanism for XSDL would provide an injective
mapping for its own purposes.  With such a mapping it would not be necessary
to have the PSVI always present to get work done - just as OO applications
generally don't need class definitions in memory.  The lack of an injective
mapping is, in my opinion, a major issue.  People should demand this be
fixed.  And in the meantime, if someone tells you, as an XSDL user, to just
put all your elements in the schema namespace, just say no.

Matthew

> -----Original Message-----
> From: Tim Bray [mailto:tbray@textuality.com]
> Sent: Saturday, August 25, 2001 6:31 PM
> To: xml-dev@lists.xml.org
> Subject: Enlightenment via avoiding the T-word
> 
> 
> In reviewing this thread, it seems to me that there is a certain
> word beginning with T the occurrence of which is strongly 
> correlated with exuberant bursts of extrapolation and exegesis.
> 
> So let me present an alternate reality which I claim to be
> internally consistent, consistent with all interesting real-world
> software implementations, and free of voodoo or magic:
> 
> XML provides a method for textually encoding data objects;
> the encoding allows the components of the data objects (whether
> provided as elements or attributes) to be given labels.  Let's
> call these labels "labels".  The label syntax is given by the 
> "Name" production from the XML recommendation.
> 
> XML Namespaces allow the labels to be extended by the addition
> of a URI reference.  Let's call these extended labels "ulabels".
> Because of the defaulting mechanism, ulabels cannot be 
> distinguished syntactically from labels without processing their 
> context.
> 
> A large variety of software programs (and humans) process XML.
> In selecting which components to process, and what processing to
> do, they are observed to use a wide variety of input information.
> Most obvious is the label (or ulabel if provided), but other
> relevant information can include
>  - the context of the component in the XML document
>  - whether it has an attribute with a particular label
>  - the value of an attribute with a particular label
>  - some external operation based on the content of the 
>    component; i.e. treat it as a part number, look up the
>    inventory in a database, and process it only if the
>    count on hand is below a critical value.
>  - entirely external information such as the time of day or
>    the identity of the software's user
> 
> One class of software application is called "validation", 
> which consists in determining whether one or more components 
> in an XML document (or the entire document) meet the 
> constraints described in a declarative specification usually 
> called a "schema".
> 
> The original XML recommendation included the specification of 
> a constraint language.  This has supported the mistaken belief 
> that validation is uniquely special and important among all 
> the classes of applications which process XML.
> 
> Historically, the only validation available was based on 
> the DTD (an acronym we can't expand here).  This ties 
> constraints to elements *only* on the basis of their label,
> and to attributes based on the combination of their label 
> and that of the element to which they are attached.  This 
> limitation, and an unfortunate choice of terminology in the 
> XML recommendation, has supported the mistaken belief that 
> labels are mystically tied one-to-one to validation 
> constraints and other semantics.  DTD validation has no 
> support for the use of ulabels.
> 
> Modern validation facilities such as XSDL, Schematron, and 
> Relax allow the tying of constraints to components in a much 
> more flexible way, including element context.  They also 
> provide good support for constraining combinations of 
> components with labels, ulabels, or a mixture of the two.
> 
> There is an open debate as to whether or not, in newly
> defined vocabularies, ulabels should be specified for 
> all the elements.  Those who feel that they should be are
> irritated by XSDL's apparent bias in the other direction.
> 
> At the end of the day, labels are just labels.  They are
> one of the stepping stones from content to semantics, but
> only one.  
> 
> If anyone wants to take issue with this, please try to do
> so without using the T-word. -Tim
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>
Follow-Ups:
- RE: Enlightenment via avoiding the T-word
  - From: Tim Bray <tbray@textuality.com>
- Re: Enlightenment via avoiding the T-word
  - From: Ronald Bourret <rpbourret@rpbourret.com>
Prev by Date: Re: Namespaces, schemas, Simon's filters.
Next by Date: Re: Enlightenment via avoiding the T-word
Previous by thread: RE: Enlightenment via avoiding the T-word
Next by thread: RE: Enlightenment via avoiding the T-word
Index(es):
- Date
- Thread