xml-dev - Re: [xml-dev] Some comments on the 1.1 draft

Re: [xml-dev] Some comments on the 1.1 draft

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Some comments on the 1.1 draft
From: Gavin Thomas Nicol <gtn@rbii.com>
Date: Wed, 19 Dec 2001 11:42:09 -0500
In-reply-to: <20011219182436.L9114@io.mds.rmit.edu.au>
Organization: Red Bridge Interactive, Inc.
References: <5C39F806F9939046B4B1AFE652500A3A251914@RED-MSG-10.redmond.corp.microsoft.com> <021401c1885b$2ec26da0$4bc8a8c0@AlletteSystems.com> <20011219182436.L9114@io.mds.rmit.edu.au>

On Wednesday 19 December 2001 02:24 am, Alan Kent wrote:
> To separate the two issues - I have no opinion on name characters.
> PCDATA however is different. I read through you entire post twice
> and must admit I still don't quite understand what your point is
> exactly. I *think* you might be saying "its good to specify the
> encoding because that way its possible to make sure characters
> not valid in that encoding are rejected." (My reading of the XML spec
> is that 0x85 is legal in the Unicode character set - that is, its
> not marked as UNUSED in the good old SGML jargon.)
>
> If this is your point, then would it be possible to define a new
> encoding which permitted the full range of Unicode characters
> (including control characters which are valid in Unicode).
> Would that address your issues?

The point is that characters != bytes != encoding. If you start allowing 
control characters (which are somewhat debatable *as* characters in the first 
place), it becomes very easy to abuse the power and to have 
application-specific uses of embedded encodings. This is effectively what Mr. 
Rhys from MS wanted: the ability to store arbitrary binary streams inside XML 
encoded data.

The problem is that XML is *text*. It is made from *characters*, and 
arbitrary binary strings have no place in it. Once you change that, you have 
essentially ruined XML as a textual markup language.

People could say that NUL et al. are still *characters* and so would be fine, 
even in UTF-8 encoded documents, but I bet they'd be rather unhappy to find 
their binary streams changing if I saved the document as UTF-16.

The point here is that these things are unreliable.

Follow-Ups:
- RE: [xml-dev] Some comments on the 1.1 draft
  - From: "J C Theriot" <theriot@posc.org>

References:
- RE: [xml-dev] Some comments on the 1.1 draft
  - From: "Michael Rys" <mrys@microsoft.com>
- Re: [xml-dev] Some comments on the 1.1 draft
  - From: "Rick Jelliffe" <ricko@allette.com.au>
- Re: [xml-dev] Some comments on the 1.1 draft
  - From: Alan Kent <ajk@mds.rmit.edu.au>

Prev by Date: Re: [xml-dev] Some comments on the 1.1 draft
Next by Date: Re: [xml-dev] terra incognita
Previous by thread: Re: [xml-dev] Some comments on the 1.1 draft
Next by thread: RE: [xml-dev] Some comments on the 1.1 draft
Index(es):
- Date
- Thread