OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] The privilege of XML parsing [in an internetworked web]

[ Lists Home | Date Index | Thread Index ]

"Henry S. Thompson" wrote:

> Indeed -- that's why W3C XML Schema _loosened_ the binding between document and
> schema, compared to XML 1.0 wrt DTDs -- an application (read 'consumer') is free
> to mandate its own W3C XML Schema (or none) in preference to whatever the author
> provides.  What's the problem?

The problem is that, whether the schema is specified by the document creator or by
the document consumer, we are still looking at constraints upon input, or at
questions of conformance decided by the form of input to a process. This is not
surprising:  it is our heritage from SGML's fundamental concept of validity (or for
others of us, from the fundamental concept of programming to an interface).
However, such notions of validity or interface conformance are inimical to the
fundamental architecture of the Web or, more broadly, to processing in general in
the internetwork topology.

Web architecture does not specify interfaces nor validity constraints upon
documents input to processes. Web architecture specifies the use of the http verbs
PUT, POST, GET when given URLs as arguments. Web architecture is about resolving a
URL to PUT or POST a document to the location addressed, or about dereferencing a
URL in order to GET an entity body representing the current state of a document.
There is no Web mechanism to specify, let alone enforce, validity or interface
conformance constraints upon the use of http verbs given URLs as arguments. A
process making appropriate use of the Web architecture or internetwork topology has
clear-cut, well-understood standard methods on-the-Web for publishing the documents
which it produces or for fetching entity bodies representing the documents which
others publish. There are no standard on-the-Web methods for verifying--let alone
enforcing--validity or interface conformance (nor even any generally accepted means
on-the-Web for specifying those constraints or enforcing their association with
particular documents).

Validity or conformance checking of input takes place inside the process boundary
of an idiosyncratic operation, and therefore effectively off-the-Web. This
distinction is not specious. Understanding that any validity or conformance
checking or enforcement must be done by idiosyncratic code inside the boundary of a
particular process is very helpful in understanding the mechanism by which any data
moves onto or off of the Web through that process boundary. What moves on the wire
is an entity body--and if properly standards-compliant, one accurately described by
a MIME type. In this mechanism, XML is in no way privileged above other types, nor
despite the history of the Web is HTML. However, because it is XML, that entity
body must at the process boundary first be parsed and successfully verified as
well-formed. From that point what happens to it is under entirely local control
within the process but is implicitly off-the-Web because its current form--whatever
it may be as the output of a parse--is not the entity body which legitimately
travels the internetwork.

With XML the normal procedure is to build a tree on the output of that parse. How
that tree is shaped by the particular processing of includes, links, entity
expansions, etc., and how that tree is decorated on any one occasion by type
information or by various annotations is controlled by the local process and
clearly may result in idiosyncratic outcome. In other words, there is no reason to
believe that the tree instantiated by, and within the boundary of, any particular
process will conform to any other tree, particularly not the tree which the
original publisher of a document might have had in mind. This is the privilege of
XML parsing:  the entirely local control of how a data structure is instantiated on
the output of the parsing which is required when an XML entity body is brought into
a process. Whether or not  the choice is to instantiate a structure specified by
the publisher of the original document or to enforce validity constraints that
publisher prefers, the choice is local within a process which is opaque to that
original publisher. It is therefore a choice to use the consumer's data structure
and validity constraints, even if what is chosen comes from, or is approved by, the
original document publisher.

I think a fundamental misunderstanding is that interoperability requires the
instantiation of the same data structure at each of two interoperating processes.
That is the fundamental assumption of two-phase commit, but it is an assumption
which can be implemented only within an homogenous traditional enterprise network.
It is also the underlying assumption of validity, which is why validity is
incongruous and in general unachievable on the Web. On an internetwork, internal
operations of processes which might seek to interoperate are opaque to each other,
including the data structures which they expect as input. Outside the process
boundary, on the internetwork, there are only the entity bodies of documents,
ideally conforming to appropriate MIME types. Interoperability is achieved when one
process can use the output of another--that is, what is published at a URL and can
be retrieved with an http GET--for its own purposes. Necessarily, 'for its own
purposes' means that the consuming process instantiates a data structure
specifically suited to the operation of that process. As it is virtually inevitable
that structure will differ from the structure used by an upstream process for its
particular purposes, interoperability is based on a particular instance entity body
shared through the operation of http verbs. That entity body is the very stuff
which moves on the Web, but it is not a data structure as is required by the
operation of processes, nor is it an archetype for such structures. That entity
body is itself a concrete instance and on a particular occasion might be the nexus
through which processes interoperate, whether or not its content, or content model,
is in any way specific to what a receiving process operates upon.

So, finally, no 'loosening'--short of disconnection--between a document instance
and a possible schema for that instance is sufficiently loose to fit--or be
natively implementable in--the Web architecture. Processes operating on the Web may
use Web verbs to effect particular connections on a particular occasion which
result in an idiosyncratic data structure appropriate to that operation of a
particular process. This is utterly at odds with the premise of validity, which
insists that, however a document is connected to a particular schema, it must
conform to that schema before it might legitimately be processed.


Walter Perry


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS