Lists Home |
Date Index |
W. E. Perry wrote:
> Alaric B Snell wrote:
> > There are several data models (and, hence, equivelance tests) in
> > circulation for XML - but every application uses at least one of
> > them... Anyone using XSLT will be looking at the XPath tree model;
> > likewise, CSS users will be looking at something similar.
> > developers will be looking at DOM or SAX or something else.
> > So yes, what the XML encodes is the application's problem - but
> > without applications to read it, or at least the possibility of
> > applications to read it, a bit of XML is just a string of
> bits, with
> > no meaning or purpose...
> > Note that I'd include being printed out on paper and read
> by a human
> > as an application. The 'human' data model of XML will, at
> heart, be a
> > woolly mix of SAX and DOM since one will probably look at a small
> > snippet of a few nested elements on a few lines as a single
> > tree-structure unit, but will otherwise read the file from top to
> > bottom ;-)
> > I mean, a string of bits is an encoding, although to decide
> 'of what'
> > you'd need to (in general) hunt down where it came from and ask,
> > unless it was an encoding you happened to recognise.
> Having officially embraced Namespaces and the Infoset, XML
> years ago forfeited its defenses against the 'abstract
> syntax' arguments of the ASN.1 partisans. The only meaningful
> way to draw a line separating XML from the abstract syntax
> camp is to insist that--at the level of the parseable
> document--there is no equivalence except
> character-by-character lexical equivalence. That is a
> fundamental distinction, and one worth insisting upon for the
> extraordinarily useful consequences it delivers. Now,
> granted, the simplest of compliant XML 1.0 parsers will make
> no distinction between single and double quotes delimiting
> attribute values nor between various manifestations of
> whitespace, but accepting such abstract equivalence is the
> price of compromise necessary to have an agreed XML 1.0
> Recommendation implementable in a parser. Choose stricter
> premises and you will find it necessary to implement
> something like Simon St.Laurent's Ripper. However, for the
> benefits of a global XML community and the bounty of
> general-purpose tools it produces, many of us have accepted
> those particular assertions of abstract equivalences which
> are incorporated in the XML 1.0 Recommendation. We accept
> them as specific, enumerated exceptions to an otherwise
> prevailing rule, and as identified exceptions they in fact
> confirm the existence of that rule in their absence.
> Namespaces, even if we accept only the simplest argument for
> them as a mechanism of disambiguation, necessarily imply that
> there is a level of abstract equivalence at which lexically
> identical GIs must be disambiguated as belonging to different
> (abstract) vocabularies, which implies the converse, that
> lexically distinct GIs may in fact be understood as
> equivalent once it is accepted that they are separate
> manifestations within different namespaces of a single
> abstraction. If we embrace namespaces this abstract
> equivalence becomes a fundamental rule which displaces the
> very different premise of XML 1.0. Likewise the Infoset,
> however light an abstraction of the instance syntax its
> original proponents insisted on, introduces abstraction as
> the general rule of which lexically variant instances are
> equivalent manifestations. Reversing the fundamental
> assumption of a lexically grounded XML, that abstraction is a
> slippery slope on which there is no meaningful point to draw
> a distinction between XML and any platonic abstract syntax
> such as ASN.1 and the like.
> I have made these arguments here before, for a number of
> years now. I apologize if my emphasis on these points becomes
> tiresome, but I think that the 50 years or longer struggle in
> the 20th century that was required for classical philology to
> understand the nature of oral poetry demonstrates why the
> physical, rather than any abstract nature of a text is worth
> insisting upon. Texts can and do 'encode' physical
> properties, among them rhythm, scansion and various devices
> of assonance. Once encoded in an instance text these
> qualities are inherent, and neither require markup nor other
> metadata to impute them, nor can those properties be removed
> from the text by instructions of markup. Beginning from the
> concrete instance gives us these properties in an unambiguous
> and clearly perceived form. On the other hand, to begin from
> an abstraction of syntax is to forfeit a concrete means of
> conveying those properties and to be forced to rely on
> metadata--that is, metadata to the abstraction!--if they are
> to be communicated at all. And of course any recipient of
> that metadata is free to ignore it in realizing some physical
> instantiation from the text. For those cases I must care most
> about, ASN.1 and abstract syntax generally are incapable of a
> precise and unambiguous encoding of inherent fundamental
> textual properties without resorting to a priori agreements
> between the creator and the consumer of a document, and from
> the very nature of document processing such agreements are
> unreliable and negligible.
> This is the fundamental distinction of document and data to
> which all permathreads return, but which I think the recent
> championing of ASN.1 on xml-dev gives us a useful new
> perspective on. Can't we now assert that what is
> fundamentally data is that of which the most salient
> properties are abstract? That is, different lexical
> manifestations are understood by both their creator and their
> consumer to be secondary to some abstract underlying platonic
> reality and, conversely, the physical qualities which might
> be inherent in a particular lexical manifestation are
> understood by both creator and consumer to be spurious and
> negligible. The content of documents, on the other hand, most
> specifically includes, often as the chief concern, those
> characteristics which come with the lexical manifestation and
> cannot be purged from the physical realization.
I like much of what you say above.
I don't know if it is possible to build a general theory of representation
of abstract data in XML independently from any particular data language.
Maybe it is possible, but I think it would be very difficult to achieve
general agreement on such a theory. The reason of the neverending
discussions in this area may be that there is no significant shared and
agreed set of concepts to support the discussion in general terms.
On the other hand, things are much easier to handle in terms of a particular
data language (such as ASN.1) using the conceptual framework of that data
language. For example, it is probably meaningless to say something like
"XML is an encoding" outside the terms of any particular data language, but
it is very clear what this sentence means if you find it in a paper about
For example, when people in a telecomms standards group meet to specify a
videoconferencing protocol in ASN.1, they use a common set of concepts.
They know what they mean by "abstract type definitions" that get encoded
in PER or perhaps in XML. They see XML and PER as possible encodings of an
abstract syntax, because they are using ASN.1.
I am sure there are many other different examples of people using some other
data language or data model or whatever, which uses XML as the "on the wire"
representation of "something else" more abstract. I don't think that you or
others actually question this.
So the problem is not whether XML is or is not an encoding of something
else. The problem is that it proves impossible to make this kind of
assertions in general terms. This assertion is neither true nor false -
just meaningless. We lack a common "metalanguage" in which this discussion
can take place.
We don't have to have one, of course. A particular data language, among
other things, is a tool for thinking and communicating. It is something
that you pick up and agree on using for a particular purpose or need.
*Then* you can talk meaningfully about things.
This leads me to the conclusion that, when you talk about XML outside the
terms of any particular data language or data model - which is absolutely
reasonable - the only thing you have is XML. There is no emerging property,
no transcendent semantics, nothing platonic - just XML. Talking about
"abstract" things without the support of a precise conceptual framework is
On the other hand, if you say, "now let's use the conceptual framework of
ASN.1 and see what happens", then you find yourself thinking and talking in
terms of abstract syntax, which can be encoded in XML or in PER or in
BER.... You will still have XML documents that aren't the encoding of
anything, but the focus will be on the abstract syntax.
So the distinction between "document" XML and "data" XML probably does not
lie in any transcendent property or emerging property of the XML instance.
Perhaps it lies in the circumstance that people use data languages and tools
that focus on "something else" that is represented as XML. (In the simplest
cases, there is no special language or tool being used but the application
code itself may be providing the abstract "thing" that is represented as XML
on the wire.) But the very terminology for describing this fact is
language-specific, **external** to XML. XML just doesn't care. This may be
one reason why we have the permathreads.
> Walter Perry
> The xml-dev list is sponsored by XML.org
> <http://www.xml.org>, an initiative of OASIS
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription