[
Lists Home |
Date Index |
Thread Index
]
> -----Original Message-----
> From: Tim Bray [mailto:tbray@textuality.com]
> Sent: Saturday, October 18, 2003 17:21
> To: Alessandro Triglia
> Cc: 'Seairth Jacobs'; 'xml-dev'
> Subject: Re: [xml-dev] UTF-8+names
>
>
> Alessandro Triglia wrote:
>
> > As I understand, in UTF-8+name, an ampersand is represented
> as &&;
> > which means that, if UTF-8+name is used for XML, "normal" entity
> > references will look like:
> >
> > &&;myentity;
> >
> > and numeric character references will look like:
> >
> > &&;#12345;
>
> No. &&; represents an ampersand. Normally it wouldn't be
> used in text
> you were going to feed to an XML processor because XML
> processors don't
> like that.
But if an XML processor understands UTF-8+names, it will invoke the
UTF-8+names codec to translate from bytes to Unicode characters. Since the
bytes 0x26 0x26 0x3E decode as AMPERSAND, this is the character that the
XML processor will see. This AMPERSAND will be (rightly) interpreted as
the beginning of an entity reference or numeric character reference in those
cases.
(If the XML processor does not understand UTF-8+names, it will encounter all
those things in the document and will not know what to do with
them. So we are certainly under the assumption that the XML processor is
aware of the encoding="utf-8+names" and understands it, therefore the
previous paragraph stands.)
Alessandro
> & represents just "&" because UTF-8+names doesn't
> assign a replacement. ü represents a single u+umlaut character,
> inhereited from HTML.
>
> --
> Cheers, Tim Bray (http://www.tbray.org/ongoing/)
>
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org
> <http://www.xml.org>, an initiative of OASIS
<http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>
|