[
Lists Home |
Date Index |
Thread Index
]
> From: Joshua Allen [mailto:joshuaa@microsoft.com]
> Sent: Friday, March 22, 2002 11:30 PM
> To: michael.h.kay@ntlworld.com; Rick Jelliffe; xml-dev@lists.xml.org
> Subject: RE: [xml-dev] MSXML DOM Special Chars Less Than 32
>
>
> > I don't want to dumb XML down. But we do sometimes need to store data
> (e.g.
> > WebDAV property values) which can potentially contain characters that
> are
> > not permitted in XML. In fact, it's very unlikely that a WebDAV
> property
> > value will contain such a character, but the software still needs to
> allow
> > for the possibility.
>
> Why would someone want to use XML if they need to transmit illegal
> characters? There are usually two cases -- one is where the illegal
> characters are insignificant, in which case they can be stripped and the
> output is well-formed XML. The other case is where the illegal
> characters *are* significant, and must be preserved for round-trip. But
> if someone wants to round-trip characters that are clearly not permitted
> by any XML processor in the world, why use XML? That's like getting mad
> because a car won't float.
That's a bit like saying that XML should not be used as marshalling
information when arbitrary strings are sent around. So should SOAP and
WebDAV changed?
The problem is that when these protocols were designed, apparently the
different concept of character data wasn't considered.
For instance:
1) What will MS Sharepoint Server do when a property name starts with a
leading digit, and a WebDAV PROPFIND request asking for "all" properties
comes in? (Answer: it sends non-wellformed XML response bodies, breaking
every compliant XML processor / WebDAV client in the world - interesting
enough, Microsoft's own clients "handle" this).
2) What is a WebDAV server supposed to do if it's actually accessing a
backend system it doesn't control entirely, and if a property value contains
control characters other than CR, LF or TAB? Your choices are: a) fail the
request, b) drop the offending characters, c) invent a new marshalling
format that is still compatible with "xs.string".
> > arguments. I guess the C lobby is sufficiently entrenched that we'll
> never
> > allow �, but apart from that I don't really see the need for
> > restrictions.
>
> But that is exactly the point: even if we started again from scratch,
> there exists a subset of characters that will end up being illegal.
> There will also exist a certain population of users who disagree with
> each illegal character choice. There will additionally be a certain
> population of implementers who disagree with the *permissiveness* of the
> characters, since it makes their lives difficult, and they have to
> handle characters in a way that is unnatural (NEL for Unix people, for
> example).
>
> So my point is that the set of illegal characters will always be an
> arbitrary value-judgment that tries to balance between implementers and
> users. I do not think it is an objective "there is one right answer"
> situation.
Agreed.
However, ignoring the issue doesn't exactly help either. Many
applications/protocols are stuck with the task of marshalling "arbitrary"
strings as XML (and datatype xs:string), so it would be good if there was an
XML-1.0 compliant, cross-protocol format to do this.
|