xml-dev - Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1

Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1

[ Lists Home | Date Index | Thread Index ]

Subject: Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1
From: Rick Jelliffe <ricko@allette.com.au>
Date: Tue, 11 Nov 2003 15:49:57 +1100
Cc: xml-dev@lists.xml.org
In-reply-to: <20031109221420.GA15695@mercury.ccil.org>
References: <20031109204707.GA6478@mercury.ccil.org> <p06002011bbd462a34f96@[192.168.254.4]> <20031109221420.GA15695@mercury.ccil.org>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3.1) Gecko/20030428

On a related matter: some people recently suggested that allowing C0 
characters
directly would make incorrect encoding labels harder to detect.

0x00 excepted, I am not aware of any case where this is so. In EBCDIC
and ASCII-family encodings, there are (always?) no bytes in sequences
which share the 0x01-0x1F byte.  I am interested in any details people
have uncovered in this regard.

Constrast this with the C1 characters: U+0080-U+009F.  There
are many encodings which use the bytes 0x80-0x9F (alone or in sequence)
to represent non-control characters.

So the argument against literal C0 characters is that inband control 
characters
are transmission artifacts that have no place literally in data. Use 
references
to get the character but escape the control semantic.

The argument against literal C1 characters is that taking advantage of
code-point redundency is our only opportunity to detect many
kinds of mislabelled encodings. The number of people making
documents who would benefit from this greatly outnumbers the
number of people who have to send data with literal C1 characters.
(Indeed, because there is no XML Infoset difference between a literal
character and a numberic reference, anyone who requires that certain
characters are represented literally goes beyond common XML
aayway.) Use references to get the character without foregoing the
robustness.

XML 1.1 is better than XML 1.0 in these two cases, IMHO.

I believe the rationales for C0 and C1 to be good engineering, rather
than goodwill to Ethopians (who surely need our goodwill.) XML
must fit in with IETF protocols concerning C0; the C0 controls may
indeed be obsolete, but the place to fix that is in the RFCs.  Good
engineering also dictates that we can measure problems as far as
possible as close as possible to the source; hence the C1s.

Cheers
Rick Jelliffe

Follow-Ups:
- Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>

References:
- Production 78 / Process failure in XML 1.1
  - From: John Cowan <cowan@mercury.ccil.org>
- Re: Production 78 / Process failure in XML 1.1
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: Production 78 / Process failure in XML 1.1
  - From: John Cowan <cowan@mercury.ccil.org>

Prev by Date: RE: [xml-dev] SAX Test suite
Next by Date: Simple Question
Previous by thread: Re: Production 78 / Process failure in XML 1.1
Next by thread: Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1
Index(es):
- Date
- Thread