xml-dev - FW: [xml-dev] UTF-8+names

FW: [xml-dev] UTF-8+names

[ Lists Home | Date Index | Thread Index ]

To: "[Public XML-DEV]" <xml-dev@lists.xml.org>
Subject: FW: [xml-dev] UTF-8+names
From: "Alessandro Triglia" <sandro@mclink.it>
Date: Mon, 20 Oct 2003 15:56:40 -0400
Importance: Normal

I wrote:
> 
> Another fact that I think has been overlooked is the following.
> 
> The following fragment of XML (encoded in UTF-8+names but 
> displayed as if it were encoded in UTF-8) contains exactly 18 
> Unicode characters:
> 
> 	<a>one&nbsp;two&lt;</a>
> 
> because   &nbsp;   counts as one character and   &lt;   
> counts as 4 characters.
> 
> The UTF-8+names encoding of this fragment of XML occupies 23 
> bytes.  The UTF-8 encoding occupies 19 bytes.

... and, by the way, the following fragment of XML is different from the one above (although it *looks* the same in this email) and contains 23 Unicode characters instead of 18:

	<a>one&nbsp;two&lt;</a>

The UTF-8 encoding of this fragment of XML occupies 23 bytes.  The UTF-8+names encoding is longer than that because the first ampersand must be encoded as the three ASCII bytes    & & ;   so that the XML entity reference  &nbsp;  is not mistaken for the pseudo-entity  &nbsp;

Alessandro

Prev by Date: RE: [xml-dev] UTF-8+names
Next by Date: Re: [xml-dev] Game of Life: an XSLT implementation
Previous by thread: RE: [xml-dev] UTF-8+names
Next by thread: inconsistent naming of styles in OpenOffice.org
Index(es):
- Date
- Thread