OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] UTF-8+names

[ Lists Home | Date Index | Thread Index ]

> -----Original Message-----
> From: Seairth Jacobs [mailto:seairth@seairth.com] 
> Sent: Saturday, October 18, 2003 15:09
> To: xml-dev
> Subject: Re: [xml-dev] UTF-8+names
> From: "Tim Bray" <tbray@textuality.com>
> >
> > Check out http://tbray.org/tag/utf-8+names.html
> Instead of throwing an error for a missing semicolon, why not 
> let that case fall under section 4.  Since the set of 
> pre-defined reference names are known, you also know the 
> maximum length possible.  If you reach length+1 characters, 
> you know it's not a reference and leave it as-is.  It also 
> means that a lone ampersand won't be an error.  I suspect the 
> motivation to require the semicolon is for consistency with 
> XML's same requirements. However, this disallows some valid 
> UTF-8 from being valid UTF-8+names as well.  While this may 
> be fine if used solely within the context of XML (which would 
> throw an error as soon as it hit the invalid reference 
> anyhow), I suspect people will try using this outside of XML as well.
> Also, I suspect that using the same format as XML/SGML's 
> references is going to confuse people.  Maybe use a similary 
> format such as #name; or @name;. This way, at least the two 
> references (UTF-8+name and XML) would not be confused.

I think there are other problems.

As I understand, in UTF-8+name, an ampersand is represented as  &&;  which
means that, if UTF-8+name is used for XML, "normal" entity references will
look like:


and numeric character references will look like:


which is ugly.

In addition, "UTF-8+name entities" would have the usual syntax:


but this can be confusing because it would denote a **literal** < character,
not one obtained by including the entity.  As a consequence, I would *not*
be allowed to use  &lt;  in the value of an attribute, for example, but
would have to use   &&;lt;  for the same purpose.

I think that these problems can be overcome by using something different
from the ampersand, as you suggest.

As for the final  ;  being mandatory, I believe it should be, for

It is not very clear to me where UTF-8+name would be useful, as I don't
think it is useful in XML.  Is it being proposed for use in areas where, for
some reason, XML cannot be used?

Alessandro Triglia
OSS Nokalva

> ---
> Seairth Jacobs (seairth@seairth.com)
> Looking: http://www.seairth.com/blog/65
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org 
> <http://www.xml.org>, an initiative of OASIS 

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS