OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Unicode and XML (was Re: [xml-dev] Remembering the origina

[ Lists Home | Date Index | Thread Index ]

Previously swallowed by the xml-dev bit bucket:

Tony Graham wrote at 17 Feb 2003 17:13:36 +0000:
 > Gavin Thomas Nicol wrote at 16 Feb 2003 14:16:23 -0500:
 >  > On Sunday 16 February 2003 12:35 pm, Mike Champion wrote:
 >  > > Stupid question:  Why couldn't XML incorporate Unicode by reference rather
 >  > > than spending half of the spec defining the "unicode-character apparatus"?
 >  > 
 >  > There are a fair number of characters that really don't make much sense as 
 >  > markup... and XML 1.0 is pretty conservative, but generally sensible. At the 
 >  > time, there were no good guidelines from the Unicode consortium on what 
 >  > should/should not be allowed, which is something they have addressed 
 >  > recently. 
 > 
 > The Unicode Standard, Version 2.0, was published in 1996.  Section
 > 5.14, Identifiers, contains guidelines for "the definition of
 > identifier syntax."
 > 
 > Unicode 2.1, which was approved eight days after XML was approved, did
 > add the simplifying mapping of syntactic classes in the "Identifier"
 > section to the character classes in the Unicode Character Database
 > (UCD) but didn't change the substance of the guidelines.
 > 
 > Section 5.16, Identifiers, of the Unicode Standard, Version 3.0, kept
 > the verbiage about what makes a good identifier, kept the mapping to
 > character classes, and dropped most of the syntatic classes, for no
 > real change in the guidelines.
 > 
 > The Unicode Standard didn't, and still doesn't, proscribe the
 > identifier syntax because "each programming language standard has its
 > own identifier syntax".
 > 
 > XML 1.0 was always going to have to define its identifier syntax,
 > i.e., its name characters, because XML allows ":", "_", "-", and "."
 > in names (whereas other, non-XML standards have their own lists of
 > extras).
 > 
 > XML 1.0 names are mostly defined in terms of UCD character classes,
 > and the suggestions for XML 1.1 names are still mostly based on those
 > character classes.  The hard work in defining XML 1.0 names would have
 > been resolving the inconsistencies in the Unicode Character Database,
 > both w.r.t. canonical equivalence (x0387) and because the UCD used to
 > contain a 'PropList.txt' file that was provided without explanation
 > and that sometimes contradicted the information in the main
 > 'UnicodeData.txt' file.
 > 
 > (And the status of 'PropList.txt' is something that has been addressed
 > recently.)
 > 
 > Regards,
 > 
 > 
 > Tony Graham
 > ------------------------------------------------------------------------
 > XML Technology Center - Dublin
 > Sun Microsystems Ireland Ltd                       Phone: +353 1 8199708
 > Hamilton House, East Point Business Park, Dublin 3            x(70)19708




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS