xml-dev - Re: [xml-dev] UTF-8+names

Re: [xml-dev] UTF-8+names

[ Lists Home | Date Index | Thread Index ]

To: Mike Champion <mc@xegesis.org>
Subject: Re: [xml-dev] UTF-8+names
From: Bill de hÓra <bill.dehora@propylon.com>
Date: Sun, 19 Oct 2003 15:36:25 +0100
Cc: xml-dev@lists.xml.org
In-reply-to: <20031019034635.91162.qmail@web13703.mail.yahoo.com>
References: <20031019034635.91162.qmail@web13703.mail.yahoo.com>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624

Mike Champion wrote:

> Tim's approach is taking a real, widespread problem
> and offering a clean, layered solution -- essentially
> a character encoding preprocessor -- rather than
> changing XML itself.

Exactly. It's a syntax macro (just like namespaces). So the 
*problem* of resolving entities hasn't gone away, specifically that 
  problem is one of getting your entities from one place to another 
and back again. Unless of course we all upgrade to use +names.

I'd love to know what SOAP-oriented folks think about it.

> Actually, a similar idea came up at the Binary Infoset
> workshop, to leverage/exploit the fact that the XML
> spec allows an open-ended set of encodings. This
> allows experimentation WITHOUT "corrupting" the core
> spec with support for local languages, stuff of
> interest mainly to mainframes, or more efficiently
> transmittable and/or lexable serializations.  

It also potentially hurts interop - there are costs and benefits to 
be weighed up.

> IMHO, it extends the Unicode encoding layer upwards to
> remove a wart in XML, not vice-versa.

I disgree :) It sems to be solving a wart (or a gap in the market) 
of Unicode transformations, not XML. Otherwise why would it be 
useful outside XML?

> 
> Anyway, I think this is a great idea, and I
> congratulate Tim for working it out and moving it
> forward.  

I like it at first glance, but the current draft is too vague. I 
suspect the impact of this encoding is more than Tim is giving 
credit for - so I'm not buying arguments from idiotic simplicity 
just yet.

I'd like to:

  o see the encoding name changed to "UTF-8+entities", the current 
name is rather vague.

  o see examples of escaped whitespace.

  o know whether <w&oacute;oops/> is a legal element name in this 
proposal.

  o hear a rationale, other than use outside XML, for choosing a new 
encoding to solve this problem, ie why not xml:entities="yes" or 
some other approach?

  o know whether the current MathML/HTML4 sets are sufficient; ie 
are we going to need to reversion this in couple of years to cater 
for ogham?

  o like elharo and Alessandro, I'm unconvinced about treatment of 
&, specifically, that it isn't being overloaded in some 
clever/sneaky way. Indeed, I'll claim that & /is/ being overloaded 
in some clever/sneaky until the next draft shows me otherwise.

Bill de hÓra
-- 
Technical Architect
Propylon
http://www.propylon.com

Follow-Ups:
- Re: [xml-dev] UTF-8+names
  - From: John Cowan <cowan@mercury.ccil.org>
- Re: [xml-dev] UTF-8+names
  - From: Joern Clausen <joern@TechFak.Uni-Bielefeld.DE>

References:
- Re: [xml-dev] UTF-8+names
  - From: Mike Champion <mc@xegesis.org>

Prev by Date: RE: [xml-dev] UTF-8+names
Next by Date: Re: [xml-dev] UTF-8+names
Previous by thread: Re: [xml-dev] UTF-8+names
Next by thread: Re: [xml-dev] UTF-8+names
Index(es):
- Date
- Thread