OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   The supersetting has begun [Was Re: Opaque data, XML, and SOAP]

[ Lists Home | Date Index | Thread Index ]
  • To: "Don Box" <dbox@microsoft.com>
  • Subject: The supersetting has begun [Was Re: Opaque data, XML, and SOAP]
  • From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
  • Date: Thu, 27 Feb 2003 07:53:17 -0500
  • Cc: xml-dev@lists.xml.org
  • In-reply-to: <57EF69AF56D92148984EDA3174082945040F99C2@RED-MSG-10.redmond.corp.microsoft.com>
  • References: <57EF69AF56D92148984EDA3174082945040F99C2@RED-MSG-10.redmond.corp.microsoft.com>

At 10:45 PM -0800 2/26/03, Don Box wrote:
>A few of us have spent some time thinking about the problem space 
>and wrote the down our thoughts in this area:

I'm still wrapping my head around this, but after several pages of 
exposition what we come to is:

	How, then, can one keep the Infoset model consistent without
	encountering these drawbacks? We believe that the answer lies in
	standardized transformations of the Infoset, or in representations
	of it. In this fashion, the message can be considered as an Infoset,
	yet avoid the penalties of actually encoding and decoding the data
	with base64 or hexadecimal text.

	To illustrate this, consider the XInclude [XInclude] mechanism,
	which allows one to create a synthetic Infoset by merging two
	Infosets, or by merging plain text content into an Infoset. This
	latter use provides for an interesting possibility; what if XInclude
	were to allow inclusion of binary data, to be transformed to base64
	or hexBinary-encoded data in the resulting synthetic Infoset? This
	would allow binary data to be integrated into the Infoset whilst
	still being serialized as raw octets.

	Yet another approach would be to define an alternate serialization
	of the Infoset (much as WBXML [WBXML] has done) that serializes
	xs:base64Binary and xs:hexBinary typed data as the actual octets,
	rather than text-encoded octets. One can easily imagine
	on-disk/on-wire representations that allow opaque data to coexist as
	raw octets with the encoded characters of the surrounding XML
	structured data.

If I'm reading this right (and I'm not yet sure I am)  then the 
authors propose a new serialization of the Infoset which allows 
embedded binary data to be directly or indirectly included in XML 
documents. This is the most incompatible subset I've seen yet, in 
that it allows what are now malformed documents. Hmm, now that I 
think of it it's really a superset, not a subset, isn't it?

Side note: the indirect, XInclude approach appears to be subject to 
all the security issues that bedevil DOCTYPEs. If SOAP wants to toss 
DOCTYPEs because of external entity resolution, I don't see how they 
can consider XInclude.

| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
|           Processing XML with Java (Addison-Wesley, 2002)          |
|              http://www.cafeconleche.org/books/xmljava             |
| http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA  |
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS