Lists Home |
Date Index |
On Sat, 2002-03-23 at 10:59, Julian Reschke wrote:
> > From: Michael Kay [mailto:firstname.lastname@example.org]
> > Sent: Saturday, March 23, 2002 3:19 PM
> > To: 'Joshua Allen'; 'Rick Jelliffe'; email@example.com
> > Subject: RE: [xml-dev] MSXML DOM Special Chars Less Than 32
> > > Why would someone want to use XML if they need to transmit illegal
> > > characters?
> > A: "I want to replicate my WebDAV configuration. I want to do
> > this by encoding all the WebDAV properties in an XML file and
> > transmitting that over the network".
> > B: "You can't represent WebDAV properties in XML, because they
> > can contain characters that XML doesn't support"
> Actually, I'd rephrase that as:
> When defined as "plain text", a WebDAV property *by definition* can't have
> values outside the allowed XML character range. Where we (WebDAV server
> developers) get in trouble is when in reality, the WebDAV server is just a
> protocol adapter to some kind of back end system, which is NOT XML-based.
> Inevitably, we'll have to find an escaping format which is XML 1.0
> compliant, cheap and generally accepted. As this problem happens with
> XML-RPC and SOAP as well, it would be nice to have a single, widely accepted
> Some of the requirements are:
> - the format of strings that *can* be represented as XML characters doesn't
> - non-XML characters must be ignored by implementations not knowing the
> escaping mechanism
There is some point in this, in terms of using XML for transport of
unpredictably reliable legacy data.
Neither of the XML escape mechanisms can carry this information. I
think this is probably a good thing. Note that it isn't only XML that
makes such a restriction: most text-oriented network protocols over TCP
carry headers, which are typically defined per SMTP or SMTP+MIME,
meaning that the header names have an even more restricted set of
characters (subset of US-ASCII), and so do header values (larger subset,
still US-ASCII). Header values, though, permit at least two forms of
escaping worth investigation: quoted-printable and encoded-word.
Now, is it an XML problem or an application problem? If it is regarded
as an XML problem, then XML could define a form of escape, similar to
one of the above, perhaps, which would allow such encoding. Since the
unicode escape mechanism already exists, and would simply have to be
required to carry C0, that could be used. In my opinion, it's a bad
idea. I ought to be able to treat text as text; XML is text (anyone
else old enough to have gotten a VT100 escape codes mail bomb?).
In short, the C0 characters have no universal interpretation;
interpretation depends upon the application. It seems reasonable, then,
that the application can encode the bloody things too. Can't use XML
mechanisms. Base64, the usual suggestion, incurs an immense overhead.
So, define an empty-restricted xsd:string type, app:quoted-printable or
app:encoded-word. Adopt and adapt existing algorithms for those
encodings. If you're not using schemata, adopt the usage of
xsi:type="app:quoted-printable". That doesn't help for attribute
values, but it does address elements. Encoded-word seems somehow more
appropriate for attribute values anyway.
Application encodes and decodes, using a set of characters even more
strongly limited than XML's, and indicating need via schema or the
in-line xsi:type indicator or prior agreement per-element and
Amelia A. Lewis firstname.lastname@example.org email@example.com
"How does one hate a country, or love one? ... What is love of one's
country; is it hate of one's uncountry? Then it's not a good thing. Is
it simply self-love? That's a good thing, but one mustn't make a virtue
of it, or a profession."
-- Therem Harth rem ir Estraven
This is a digitally signed message part