xml-dev - Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1

Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1

[ Lists Home | Date Index | Thread Index ]

To: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Subject: Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1
From: Rick Jelliffe <ricko@allette.com.au>
Date: Wed, 12 Nov 2003 02:31:04 +1100
Cc: xml-dev@lists.xml.org
In-reply-to: <p06002011bbd61d39d949@[192.168.254.4]>
References: <20031109204707.GA6478@mercury.ccil.org> <p06002011bbd462a34f96@[192.168.254.4]> <20031109221420.GA15695@mercury.ccil.org> <3FB06A75.8030809@allette.com.au> <p06002011bbd61d39d949@[192.168.254.4]>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3.1) Gecko/20030428

Elliotte Rusty Harold wrote:

> At 3:49 PM +1100 11/11/03, Rick Jelliffe wrote:
>
>> So the argument against literal C0 characters is that inband control 
>> characters
>> are transmission artifacts that have no place literally in data. Use 
>> references
>> to get the character but escape the control semantic.
>
> There's also the argument that C0 controls may accidentally control 
> something. There are still a few old printers here and there that will 
> break a page on a form feed. There might even be some gateways that 
> use the C0's for other purposes. 

I know of people still using serial terminals (perhaps with Xon/Xoff.) 
In Taiwan one
place that had computerized early had terminals, because their 
mainframes used character
sets that terminal emulation programs did not accept. They cannot must 
wait till
their mainframe applications become obsolete before they get rid of 
them. (But they
are not sending XML anyway.)  Modems still sometimes use Xon/Xoff, but 
because
people run PPP etc rather than sending files directly, control 
characters in data
is now not a problem for serial comms AFAIK. If XML just uses application/
then I think there is no RFC problem with literal controls.

Mislabelled UTF-16 encodings will always be detected in XML, because the 
presence of
the 0x00 bytes and/or BOM*

In UTF-8, all the bytes for characters > U+007F are bytes > 0x7F, so 
again this
will be detected.

Cheers
Rick Jelliffe

* The only exception I can think of is if we have
 - an external parsed entity in UTF-16
 - with no encoding header defaulting to UTF-8 or html's 8859-1
 - which has only data and no markup
 - and no ASCII/Latin1 data including spaces (these would cause a 0x00 
byte),
 - and whose UTF-16 bytes also correspond to a valid UTF-8 or 8859-1 
patterns,

Except for monkeys typing XML, data usually has meaning; so most potential
strings never in fact could appear, which may lessen this edge case anyway.

Follow-Ups:
- Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1
  - From: Richard Tobin <richard@cogsci.ed.ac.uk>

References:
- Production 78 / Process failure in XML 1.1
  - From: John Cowan <cowan@mercury.ccil.org>
- Re: Production 78 / Process failure in XML 1.1
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: Production 78 / Process failure in XML 1.1
  - From: John Cowan <cowan@mercury.ccil.org>
- Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1
  - From: Rick Jelliffe <ricko@allette.com.au>
- Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>

Prev by Date: Re: [xml-dev] XPath help
Next by Date: RE: [xml-dev] Simple Question
Previous by thread: Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1
Next by thread: Re: [xml-dev] Re: Production 78 / Process failure in XML 1.1
Index(es):
- Date
- Thread