[
Lists Home |
Date Index |
Thread Index
]
- To: "Don Box" <dbox@microsoft.com>
- Subject: The supersetting has begun [Was Re: Opaque data, XML, and SOAP]
- From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Date: Thu, 27 Feb 2003 07:53:17 -0500
- Cc: xml-dev@lists.xml.org
- In-reply-to: <57EF69AF56D92148984EDA3174082945040F99C2@RED-MSG-10.redmond.corp.microsoft.com>
- References: <57EF69AF56D92148984EDA3174082945040F99C2@RED-MSG-10.redmond.corp.microsoft.com>
At 10:45 PM -0800 2/26/03, Don Box wrote:
>A few of us have spent some time thinking about the problem space
>and wrote the down our thoughts in this area:
>
>http://www.xml.com/pub/a/2003/02/26/binaryxml.html
I'm still wrapping my head around this, but after several pages of
exposition what we come to is:
How, then, can one keep the Infoset model consistent without
encountering these drawbacks? We believe that the answer lies in
standardized transformations of the Infoset, or in representations
of it. In this fashion, the message can be considered as an Infoset,
yet avoid the penalties of actually encoding and decoding the data
with base64 or hexadecimal text.
To illustrate this, consider the XInclude [XInclude] mechanism,
which allows one to create a synthetic Infoset by merging two
Infosets, or by merging plain text content into an Infoset. This
latter use provides for an interesting possibility; what if XInclude
were to allow inclusion of binary data, to be transformed to base64
or hexBinary-encoded data in the resulting synthetic Infoset? This
would allow binary data to be integrated into the Infoset whilst
still being serialized as raw octets.
Yet another approach would be to define an alternate serialization
of the Infoset (much as WBXML [WBXML] has done) that serializes
xs:base64Binary and xs:hexBinary typed data as the actual octets,
rather than text-encoded octets. One can easily imagine
on-disk/on-wire representations that allow opaque data to coexist as
raw octets with the encoded characters of the surrounding XML
structured data.
If I'm reading this right (and I'm not yet sure I am) then the
authors propose a new serialization of the Infoset which allows
embedded binary data to be directly or indirectly included in XML
documents. This is the most incompatible subset I've seen yet, in
that it allows what are now malformed documents. Hmm, now that I
think of it it's really a superset, not a subset, isn't it?
Side note: the indirect, XInclude approach appears to be subject to
all the security issues that bedevil DOCTYPEs. If SOAP wants to toss
DOCTYPEs because of external entity resolution, I don't see how they
can consider XInclude.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| Processing XML with Java (Addison-Wesley, 2002) |
| http://www.cafeconleche.org/books/xmljava |
| http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.cafeconleche.org/ |
+----------------------------------+---------------------------------+
|