[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
A bit of pedantry (was Personal reply to Edd Dumbill's XML HackArticle. . .)
- From: "W. E. Perry" <email@example.com>
- To: XML DEV <firstname.lastname@example.org>
- Date: Tue, 13 Mar 2001 02:43:00 -0500
"Thomas B. Passin" wrote:
> In this sense, your semantics may be my syntax. XML markup lets a processor
> know how to build a generic tree, or, SAX-like, send certain events. So in a
> way the markup functions as a kind of semantics for an xml document, at least
> at the level of the lowest level of processing. It appears that there are
> layers to this semantics business.
Yes, there are layers. One underlying question in this discussion is which of
those layers are inherently XML. In my experience of this list, and the community
it serves, the extent of general agreement extends only so far as to include
well-formed XML markup. Even within the XML 1.0 specification, the question of the
canonical semantics of XML is raised by the optional nature of validation. The
dividing line seems always to be a process. To elaborate from the syntax of an
instance of well-formed XML the additional semantics of validity requires further
processing. That such processing is optional under the XML 1.0 specification
corroborates the conclusion that the inherent nature of XML syntax extends only so
far as well-formedness. That same conclusion also permits e.g. the W3C XML Schema
mechanism to substitute different processing than that required for DTD-based
validation and, as it outcome, the elaboration of different semantics. The
examples you cite of building a generic tree (i.e., W3C DOM and some analogues) or
emitting SAX events are other alternative processes which may be invoked to
elaborate additional semantics beyond those of simple well-formedness checking.
Namespace processing also requires its own processing, which elaborates further
semantics. The contentious question is always which of these processes are 'core'
or even mandatory to XML or, asked from the opposite perspective, which of them
elaborate the expected or canonical semantics of an XML instance. The Infoset
offers one, but not the only possible prescriptive to answer that question. In
disposition of those instances which do not meet its expectations (as e.g. by
invoking no namespace processing) the Infoset declares that they have no infoset,
which is probably a reasonable laissez-faire conclusion. The contentious political
problem arises when other specifications take the Infoset as their starting point
and imply (or more) thereby that its one (out of many possible) semantic
elaboration is the baseline definition of XML.
In my skepticism about the Infoset I am probably in the minority, but I attempt to
reinforce my opinion by always speaking of additional semantics as elaborated from
processes beyond what is required for simple well-formedness checking. Because
there is no doubt what those processes are, I do not think we should blur the line
between syntax and semantics by saying 'your semantics may be my syntax'. You may
well regard the processes by which you elaborate your semantics as essential to
the nature of XML, but you do not thereby drive those processes onto the common
ground which well-formedness occupies, and from which any additional process
beyond well-formedness checking loses a significant number of adherents in this
> I ask again, why are we using markup? From the above viewpoint, the markup
> moves a layer of semantics out of the processor and into the document or data
> set (of course, doing so in a standard way).
I hope that it is now clear why I disagree so strongly with this characterization.
The instance document has an integrity which is grounded in its markup syntax and
unchanged by whatever processing may be performed, and whatever semantics thereby
elaborated. Processing never creates a new baseline in the sense that the document
is a syntactic baseline. Yes, we need to pipeline these processes in order to
achieve highly specific data handling solutions. Yet the underlying documents
remain intact, and in different circumstances, for different purposes, may be
submitted to entirely different processes, and yield on each occasion
appropriately different semantics.