OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   negotiating XML (long)

[ Lists Home | Date Index | Thread Index ]
  • From: "Simon St.Laurent" <simonstl@simonstl.com>
  • To: XML-Dev Mailing list <xml-dev@xml.org>
  • Date: Thu, 06 Jul 2000 09:54:26 -0400

Over on SML-DEV, I just wrote:
>I'd like to suggest that the days of 'accept generously, send
>conservatively' are coming to a close, and that maybe it's time to accept
>conservatively and expect the sender of documents to figure out why they
>don't work.

This set off another train of thought that seems more appropriate to this

Right now we have a number of varieties of (conforming) XML parsers with
different feature sets:
	* validating/non-validating
	* loads external entities/doesn't
	* loads external DTD subset/doesn't
	* namespace aware/not
	* supports encodings beyond UTF-8,16/doesn't

We have more Infoset-affecting features (XInclude, XBase, XML Schemas) on
the way, as well as features (digital signatures) that depend on document
content being just right.

It seems that the philosophy most frequently advocated for handling all of
these cases is the use of MEGAPARSER, a wild piece of software that can
handle all of these situations and ensure that your application correctly
and completely understands whatever John Q. User sends.  It's that "send
conservatively, but accept generously" thing.

I'm not sure that's actually a good practice.  Apart from requiring
ever-larger XML processors, it makes it more difficult to use XML reliably
in situations where performance is important - and there are more and more
of those every day.  There have been some moves in the data binding
communities affiliated with XML to create what amount to special-purpose
parsers that accept only one kind of document to improve performance, as
well as suggestions on SML-DEV that creating parsers supporting a clearly
specified subset of XML may be appropriate for certain situations.

I've tackled these problems a number of different ways:
	* Creating XML Processing Description Language (XPDL) [1] to
	   describe processing requirements for XML document classes,
	   making it possible for parsers to bow out gracefully if
	   they lack needed capabilities.
	* Worked (with a great crew of people on SML-DEV) to describe
	   Common XML [2], a redescription of XML that provides warning
	   messages to XML document authors about which features may or
	   may not be supported by various XML parsers, in the hope that
	   authors will understand the distinction between the 'safe core'
	   and features that may cause problems.
	* Presented [3] explanations of XML 1.0 that focus on its options,
	   rather than on its dream of fully interoperable syntax.

While I'm glad to have done all of these things, and plan to continue, I'm
concerned that it's really time for these issues to become part of an
infrastructure rather than best practices.  It seems like it should be
possible for applications to negotiate with each other to ensure that
similar expectations for XML document content are met and that XML
documents are in fact reliable containers of information.

To some extent, canonicalization is an answer - if everyone ships around
canonicalized documents, we can all rest easier.  Unfortunately, that
doesn't seem likely or necessarily appropriate in all cases.  While it's a
useful answer, it's one that can cost information at times and doesn't
answer every question.

A few of the items I'd love to be able to ask for include:
	* A version of the document with the internal and 
	   external subsets of the DTD normalized and stored 
	   in the internal subset
	* A version of the document with all entities expanded
	* A version of the document where all characters in a
	   certain range are described using character references
	* A version of the document where all attribute values are
	   defaulted from the DTD, but all entities are left 
	* A version of the document in XYZ character encoding
	* A version of the document that is validated by the sender

This same work could get into semantic issues, perhaps referencing
transformations to provide the core information in any of a variety of
different formats or using particular namespaces.  All of this crosses over
into content negotiation [4], CC/PP [5], and XML packaging.  (It's not MIME
types, as we've gone too deep already.)

I'm not sure that MEGAPARSER is a huge problem for larger applications, but
cases where network dependencies must be minimized, processing kept simple
(think embedded systems), or reliable transfers guaranteed all seem like
they might be common beneficiaries of such an infrastructure.  

At its simplest, a parser might just have a description file that gets sent
along with a file request, stating flatly that it must have its XML in a
particular style.  A more complex approach might permit the two sides to
figure out how best to transfer information.

Does this kind of collaborative and/or negotiated processing interest
people? Or is everyone happy with the current stew of possibilities, and
using documentation and intervention to sort things out? 

I'm pondering creating a simple vocabulary for use with the IETF's content
negotiation work, but we'll see if/where I can find the time.

[1] - XPDL - 
[2] - Common XML - 
[3] - Interoperability - 
[4] - conneg working group (IETF) - 
[5] - CC/PP (Composite Capabilities/Preference Profiles)- 

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
http://www.simonstl.com - XML essays and books

This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS