xml-dev - Re: [xml-dev] UTF-8+names

Re: [xml-dev] UTF-8+names

[ Lists Home | Date Index | Thread Index ]

To: John Cowan <cowan@mercury.ccil.org>
Subject: Re: [xml-dev] UTF-8+names
From: Bill de hÓra <bill.dehora@propylon.com>
Date: Sun, 19 Oct 2003 19:35:40 +0100
Cc: xml-dev@lists.xml.org
In-reply-to: <20031019171750.GI20059@mercury.ccil.org>
References: <20031019034635.91162.qmail@web13703.mail.yahoo.com> <3F92A169.60903@propylon.com> <20031019171750.GI20059@mercury.ccil.org>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624

John,

> Bill de h?ra scripsit:
>>I'd love to know what SOAP-oriented folks think about it.
> 
> John Cowan wrote:
> I should think they'd think it's totally useless, just like the original
> proposal for infoset-level names.  Machine-generated XML would have no
> use for UTF-8+names.

That's to ignore a raft of use cases, where human generated content
goes through machine generated envelopes - so much the worse for the
envelope crowd perhaps. So I'll re-ask. I'd love to know what
RSS-oriented folks think about it.

>>see the encoding name changed to "UTF-8+entities", the current 
>>name is rather vague.
> 
> The whole point of "names" is that they are not entities.  In fact,
> we go to some lengths to make sure they can be used with entities
> without interfering with them.

Well they look a lot like entities to me and have much of 
characteristics of entities, and probabaly all of their usecases. 
Especially the ones being imported from MathML and HTML4. The main 
difference seems to be in how they are  declared.

>>see examples of escaped whitespace. 
> 
> If you mean strictly XML whitespace, I think the last thing we want
> is to have names like "space", "cr", "lf", or "tab".  These would make
> hideous uglifications like
> 
> 	<foo&space;bar="baz"&space;/>
> 
> possible, and who wants that?

Even so, I'd like some words around whitespace - it's great for
thrashing out details. And I was thinking of cases like

   <foo& ;bar="baz"& ;/>

>>know whether <w&oacute;oops/> is a legal element name in this 
>>proposal. 
> 
> Yes.  To XML, it's an empty-tag whose name contains six characters,
> all of them legal in names.

*Definitely* mention this in the next draft.

>>hear a rationale, other than use outside XML, for choosing a new 
>>encoding to solve this problem, ie why not xml:entities="yes" or 
>>some other approach?
> 
> Because there are already zillions of encodings, and what's one more?
> But getting XML changed in any particular cannot be done without
> immense pain, as no one knows better than I.

Then I see an issue with +names. Once the prolog is lost I'm
hosed. More and more XML is being shipped around inside envelopes,
which are probably UTF-8 or iso-latin. Have we lost some
self-description in this encoding, by it doubling up as a macro and
an encoding (although that's a somewhat fuzzy distinction)?

>>know whether the current MathML/HTML4 sets are sufficient; ie 
>>are we going to need to reversion this in couple of years to cater 
>>for ogham?
> 
> 
> A very good question.  My sense is that they are not sufficient, but
> that Ogham isn't going to be an issue either.

Let's not worry about ogham then.  If they are not sufficient,
what's the upgrade path?

>>like elharo and Alessandro, I'm unconvinced about treatment of 
>>&, specifically, that it isn't being overloaded in some 
>>clever/sneaky way. Indeed, I'll claim that & /is/ being overloaded 
>>in some clever/sneaky until the next draft shows me otherwise.
> 
> What *concrete* action would it take to convince you?  Proving a negative
> (that there is nothing underhanded here) is notoriously difficult to
> impossible: the unconvincee can always retort "But there still *could*
> be something wrong!"

Ok. You could start by addressing Elliote's, Alessandro's and Mike's
posts on the matter. Then use the answers to make the draft clearer
- by clearer I mean reduce the need for the reader to make the right
inferences, or reduce the ability to jump to the wrong conclusions
(depending on your outlook :).

I'm not asking these questions for the sake of argument, nor am I
looking for proof of the non-existence of unicorns. I have direct
use-cases with customers where +names could be useful. But taking
myself as a sample of one, this first draft failed the idiotically
simple test. That's not to say it doesn't have merit; any progress
on this problem is welcome.

Bill de hÓra

-- 
Technical Architect
Propylon
http://www.propylon.com

Follow-Ups:
- Re: [xml-dev] UTF-8+names
  - From: John Cowan <cowan@mercury.ccil.org>
- Re: [xml-dev] UTF-8+names
  - From: Tim Bray <tbray@textuality.com>

References:
- Re: [xml-dev] UTF-8+names
  - From: Mike Champion <mc@xegesis.org>
- Re: [xml-dev] UTF-8+names
  - From: Bill de hÓra <bill.dehora@propylon.com>
- Re: [xml-dev] UTF-8+names
  - From: John Cowan <cowan@mercury.ccil.org>

Prev by Date: Re: [xml-dev] UTF-8+names
Next by Date: RE: [xml-dev] missing tools
Previous by thread: Re: [xml-dev] UTF-8+names
Next by thread: Re: [xml-dev] UTF-8+names
Index(es):
- Date
- Thread