xml-dev - RE: [xml-dev] UTF-8+names

RE: [xml-dev] UTF-8+names

[ Lists Home | Date Index | Thread Index ]

To: "'Seairth Jacobs'" <seairth@seairth.com>, "'xml-dev'" <xml-dev@lists.xml.org>
Subject: RE: [xml-dev] UTF-8+names
From: "Alessandro Triglia" <sandro@mclink.it>
Date: Sat, 18 Oct 2003 15:46:20 -0400
Importance: Normal
In-reply-to: <002701c395ab$41617c80$a600a8c0@SeairthA31>



> -----Original Message-----
> From: Seairth Jacobs [mailto:seairth@seairth.com] 
> Sent: Saturday, October 18, 2003 15:09
> To: xml-dev
> Subject: Re: [xml-dev] UTF-8+names
> 
> 
> From: "Tim Bray" <tbray@textuality.com>
> >
> > Check out http://tbray.org/tag/utf-8+names.html
> 
> Instead of throwing an error for a missing semicolon, why not 
> let that case fall under section 4.  Since the set of 
> pre-defined reference names are known, you also know the 
> maximum length possible.  If you reach length+1 characters, 
> you know it's not a reference and leave it as-is.  It also 
> means that a lone ampersand won't be an error.  I suspect the 
> motivation to require the semicolon is for consistency with 
> XML's same requirements. However, this disallows some valid 
> UTF-8 from being valid UTF-8+names as well.  While this may 
> be fine if used solely within the context of XML (which would 
> throw an error as soon as it hit the invalid reference 
> anyhow), I suspect people will try using this outside of XML as well.
> 
> Also, I suspect that using the same format as XML/SGML's 
> references is going to confuse people.  Maybe use a similary 
> format such as #name; or @name;. This way, at least the two 
> references (UTF-8+name and XML) would not be confused.


I think there are other problems.

As I understand, in UTF-8+name, an ampersand is represented as  &&;  which
means that, if UTF-8+name is used for XML, "normal" entity references will
look like:

	&&;myentity;

and numeric character references will look like:

	&&;#12345;

which is ugly.

In addition, "UTF-8+name entities" would have the usual syntax:

	&lt;

but this can be confusing because it would denote a **literal** < character,
not one obtained by including the entity.  As a consequence, I would *not*
be allowed to use  &lt;  in the value of an attribute, for example, but
would have to use   &&;lt;  for the same purpose.

I think that these problems can be overcome by using something different
from the ampersand, as you suggest.

As for the final  ;  being mandatory, I believe it should be, for
robustness.

It is not very clear to me where UTF-8+name would be useful, as I don't
think it is useful in XML.  Is it being proposed for use in areas where, for
some reason, XML cannot be used?

Alessandro Triglia
OSS Nokalva


> 
> ---
> Seairth Jacobs (seairth@seairth.com)
> Looking: http://www.seairth.com/blog/65
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org 
> <http://www.xml.org>, an initiative of OASIS 
<http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>

Follow-Ups:
- Re: [xml-dev] UTF-8+names
  - From: Tim Bray <tbray@textuality.com>

References:
- Re: [xml-dev] UTF-8+names
  - From: "Seairth Jacobs" <seairth@seairth.com>

Prev by Date: Re: [xml-dev] UTF-8+names
Next by Date: Re: [xml-dev] UTF-8+names
Previous by thread: Re: [xml-dev] UTF-8+names
Next by thread: Re: [xml-dev] UTF-8+names
Index(es):
- Date
- Thread