OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   FW: [xml-dev] UTF-8+names

[ Lists Home | Date Index | Thread Index ]
  • To: "[Public XML-DEV]" <xml-dev@lists.xml.org>
  • Subject: FW: [xml-dev] UTF-8+names
  • From: "Alessandro Triglia" <sandro@mclink.it>
  • Date: Mon, 20 Oct 2003 15:56:40 -0400
  • Importance: Normal



I wrote:
> 
> Another fact that I think has been overlooked is the following.
> 
> The following fragment of XML (encoded in UTF-8+names but 
> displayed as if it were encoded in UTF-8) contains exactly 18 
> Unicode characters:
> 
> 	<a>one&nbsp;two&lt;</a>
> 
> because   &nbsp;   counts as one character and   &lt;   
> counts as 4 characters.
> 
> The UTF-8+names encoding of this fragment of XML occupies 23 
> bytes.  The UTF-8 encoding occupies 19 bytes.


... and, by the way, the following fragment of XML is different from the one above (although it *looks* the same in this email) and contains 23 Unicode characters instead of 18:

	<a>one&nbsp;two&lt;</a>

The UTF-8 encoding of this fragment of XML occupies 23 bytes.  The UTF-8+names encoding is longer than that because the first ampersand must be encoded as the three ASCII bytes    & & ;   so that the XML entity reference  &nbsp;  is not mistaken for the pseudo-entity  &nbsp;

Alessandro







 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS