Lists Home |
Date Index |
- To: "Rick Jelliffe" <email@example.com>,<firstname.lastname@example.org>
- Subject: RE: Why would MS want to make XML break on UNIX, Perl, Python, etc ?
- From: "Derek Denny-Brown" <email@example.com>
- Date: Thu, 20 Dec 2001 14:16:54 -0800
- Thread-index: AcGJEVRpOJmuvKJBTdyE9fPKYfT3yQAjxh7Q
- Thread-topic: Why would MS want to make XML break on UNIX, Perl, Python, etc ?
I have a proposal, regarding control characters:
The argument that Rick has been spearheading, is focused on the allowed
characters in the text stream to be parsed as XML. As a number of
people have argued, there are applications which desire to transport
control characters, or at least have that as an option. XML currently
makes there life unusually hard because there is no standardized way to
do this. If the question is changed from:
Do we allow xml files to have control characters in their text?
Do we want to support applications which wish to encode control
characters in their data?
I think most people would agree that, the question becomes much less
controvercial. Since the majority of arguments have been over the fact
that control characters have no standardized meaning (outside of the few
exceptions noted in one email... I forget by whom), this now strikes
some surprising chords in my brain with XML's handling of CR/NL. The
XML spec mandates that the CR/NL bit sequence be translated to a single
CR, and in this way defines a normalized view over the variety of
end-of-line behaviors. It _also_ allows a NL to be made explicit by
encoding it as a character reference. Now since XML has defined that
end-of-line is encoded as 0xA, what could this explic 0xD mean? This is
This is very similar to the situation with control characters, so why
not fix it the same way? This is awkward for those applications which
desire to (possibly) encode control characters, but is far less awkward
than having to define their own app-specific mechanism to encode such
characters, or base64 encoding the entire string. This works well with
text editors. This incurs no issues with TTY/stdio handling.
There have been a number of arguments that control characters have no
defined meaning. This proposal does not address that at all, because
that does not seem to be a real issue. The heated arguments are over
the textual format of XML, so let's address that specific problem, not
generalize and throw the baby out with the bathwater.
Yes, I work for Microsoft also (although not in the same group as
Michael Rys), and no I have no desire to see XML rendered broken on any
platform. I do have a lot of experience with trying to convince
non-XML/non-text oriented people that XML is a good solution to their
problems. That also means I have a good amount of experience with
people coming back to me surprised about certain aspects of XML.
Support for encoding control characters is one of these issues which has
cropped up numerous times. I can't claim that I always approve of how
people use XML, but I'd much rather that IIS stored it's config info in
an XML file than a proprietary file format, and I think most people
In order to facilitate this, XML _needs_ to lossen up some of it's
restrictions. Goals such as these are what drive MS's input on updates
to XML (although I honestly have no idea what our specific input was, as
I am not directly involved with any W3C interactions). We are not out
to try and
Break everyone else, we just want to get our jobs done. The very people
Rich is blaming for trying to subvert XML, are the people who are
pushing to have MS use published standards as opposed to developing all
new solutions. Those are the very people that the community needs to
work _with_, not against.