OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] UTF-8+names

[ Lists Home | Date Index | Thread Index ]
  • To: "[Public XML-DEV]" <xml-dev@lists.xml.org>
  • Subject: RE: [xml-dev] UTF-8+names
  • From: "Alessandro Triglia" <sandro@mclink.it>
  • Date: Mon, 20 Oct 2003 15:27:30 -0400
  • Importance: Normal


Another fact that I think has been overlooked is the following.

The following fragment of XML (encoded in UTF-8+names but displayed as if it were encoded in UTF-8) contains exactly 18 Unicode characters:

	<a>one&nbsp;two&lt;</a>

because   &nbsp;   counts as one character and   &lt;   counts as 4 characters.

The UTF-8+names encoding of this fragment of XML occupies 23 bytes.  The UTF-8 encoding occupies 19 bytes.

Now, while   &nbsp;  is easy to remember as being one of the magic pseudo-entities, how about any of those 2000+ pseudo-entities listed in the draft?  Can anybody determine, without doing a lookup, how many Unicode characters are there in

	<a>one&column-separator;two&lt;</a>

?

Is the general opinion here that this kind of confusion is not important (say, not important to software vendors and not important to users of XML technologies)?

Alessandro







 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS