[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Enlightenment via avoiding the T-word
- From: Tim Bray <firstname.lastname@example.org>
- To: email@example.com
- Date: Sat, 25 Aug 2001 18:30:46 -0700
In reviewing this thread, it seems to me that there is a certain
word beginning with T the occurrence of which is strongly
correlated with exuberant bursts of extrapolation and exegesis.
So let me present an alternate reality which I claim to be
internally consistent, consistent with all interesting real-world
software implementations, and free of voodoo or magic:
XML provides a method for textually encoding data objects;
the encoding allows the components of the data objects (whether
provided as elements or attributes) to be given labels. Let's
call these labels "labels". The label syntax is given by the
"Name" production from the XML recommendation.
XML Namespaces allow the labels to be extended by the addition
of a URI reference. Let's call these extended labels "ulabels".
Because of the defaulting mechanism, ulabels cannot be
distinguished syntactically from labels without processing their
A large variety of software programs (and humans) process XML.
In selecting which components to process, and what processing to
do, they are observed to use a wide variety of input information.
Most obvious is the label (or ulabel if provided), but other
relevant information can include
- the context of the component in the XML document
- whether it has an attribute with a particular label
- the value of an attribute with a particular label
- some external operation based on the content of the
component; i.e. treat it as a part number, look up the
inventory in a database, and process it only if the
count on hand is below a critical value.
- entirely external information such as the time of day or
the identity of the software's user
One class of software application is called "validation",
which consists in determining whether one or more components
in an XML document (or the entire document) meet the
constraints described in a declarative specification usually
called a "schema".
The original XML recommendation included the specification of
a constraint language. This has supported the mistaken belief
that validation is uniquely special and important among all
the classes of applications which process XML.
Historically, the only validation available was based on
the DTD (an acronym we can't expand here). This ties
constraints to elements *only* on the basis of their label,
and to attributes based on the combination of their label
and that of the element to which they are attached. This
limitation, and an unfortunate choice of terminology in the
XML recommendation, has supported the mistaken belief that
labels are mystically tied one-to-one to validation
constraints and other semantics. DTD validation has no
support for the use of ulabels.
Modern validation facilities such as XSDL, Schematron, and
Relax allow the tying of constraints to components in a much
more flexible way, including element context. They also
provide good support for constraining combinations of
components with labels, ulabels, or a mixture of the two.
There is an open debate as to whether or not, in newly
defined vocabularies, ulabels should be specified for
all the elements. Those who feel that they should be are
irritated by XSDL's apparent bias in the other direction.
At the end of the day, labels are just labels. They are
one of the stepping stones from content to semantics, but
If anyone wants to take issue with this, please try to do
so without using the T-word. -Tim