[
Lists Home |
Date Index |
Thread Index
]
Previously swallowed by the xml-dev bit bucket:
Tony Graham wrote at 17 Feb 2003 17:13:36 +0000:
> Gavin Thomas Nicol wrote at 16 Feb 2003 14:16:23 -0500:
> > On Sunday 16 February 2003 12:35 pm, Mike Champion wrote:
> > > Stupid question: Why couldn't XML incorporate Unicode by reference rather
> > > than spending half of the spec defining the "unicode-character apparatus"?
> >
> > There are a fair number of characters that really don't make much sense as
> > markup... and XML 1.0 is pretty conservative, but generally sensible. At the
> > time, there were no good guidelines from the Unicode consortium on what
> > should/should not be allowed, which is something they have addressed
> > recently.
>
> The Unicode Standard, Version 2.0, was published in 1996. Section
> 5.14, Identifiers, contains guidelines for "the definition of
> identifier syntax."
>
> Unicode 2.1, which was approved eight days after XML was approved, did
> add the simplifying mapping of syntactic classes in the "Identifier"
> section to the character classes in the Unicode Character Database
> (UCD) but didn't change the substance of the guidelines.
>
> Section 5.16, Identifiers, of the Unicode Standard, Version 3.0, kept
> the verbiage about what makes a good identifier, kept the mapping to
> character classes, and dropped most of the syntatic classes, for no
> real change in the guidelines.
>
> The Unicode Standard didn't, and still doesn't, proscribe the
> identifier syntax because "each programming language standard has its
> own identifier syntax".
>
> XML 1.0 was always going to have to define its identifier syntax,
> i.e., its name characters, because XML allows ":", "_", "-", and "."
> in names (whereas other, non-XML standards have their own lists of
> extras).
>
> XML 1.0 names are mostly defined in terms of UCD character classes,
> and the suggestions for XML 1.1 names are still mostly based on those
> character classes. The hard work in defining XML 1.0 names would have
> been resolving the inconsistencies in the Unicode Character Database,
> both w.r.t. canonical equivalence (x0387) and because the UCD used to
> contain a 'PropList.txt' file that was provided without explanation
> and that sometimes contradicted the information in the main
> 'UnicodeData.txt' file.
>
> (And the status of 'PropList.txt' is something that has been addressed
> recently.)
>
> Regards,
>
>
> Tony Graham
> ------------------------------------------------------------------------
> XML Technology Center - Dublin
> Sun Microsystems Ireland Ltd Phone: +353 1 8199708
> Hamilton House, East Point Business Park, Dublin 3 x(70)19708
|