xml-dev - Re: First draft of proposed XML TC for Unicode 3.0 (unofficial)

Re: First draft of proposed XML TC for Unicode 3.0 (unofficial)

[ Lists Home | Date Index | Thread Index ]

From: John Cowan <cowan@locke.ccil.org>
To: xml-dev@ic.ac.uk
Date: Fri, 10 Sep 1999 17:11:47 -0400 (EDT)

Nik O scripsit:

> It is a given that changes from Unicode 2.0 to 3.0 will require changes to
> XML 1.0, and thus all existing XML-compliant parsers will cease to be
> compliant when the changes are made.  These Unicode changes aren't
> "corrections of printers errors" -- they are real changes in the XML spec,
> and will require changes to XML parsers and apps, as well.

I think only to the parsers, and only to their lowest levels at that;
apps are probably not sensitive to what characters appear in names
for the most part.

> I guess my previous message was sufficiently obtuse, since my real intention
> was to raise the issue of how these sorts of changes are to be managed.

Obscure (hard to understand) maybe, but not obtuse (stupid).

> I should have said that "..the new BaseChars changes _will_ break _all_
> existing XML parsers and/or apps..".  Once this change is made to XML,
> existing parsers won't be compliant since they've implemented BNF rule 85
> from REC-xml-19980210, and thus won't recognize these new BaseChars (e.g.
> #x01F6) as legal name characters.

Correct.  And in fact there may wind up being XML 1.0 vs. XML 1.0 with
Unicode 3.0 support (a la SGML TCs).  Not, I trust, XML 1.1.

> Very true, but isn't this [backward-compatibility rules] the top of
> a slippery slope, whereby every change
> to Unicode might require yet another special rule to maintain backward
> compatibility?

In principle yes, in fact no.  The Unicode Consortium and WG2 are focused
on adding support for new characters in Unicode 4.0 and beyond, not on
making changes to old stuff, though no one can rule out the possibility
of an error that has simply gone undetected hitherto, like the classification
of U+212E ESTIMATED SYMBOL as a lower-case letter.  (The glyph looks
a lot like "e", but isn't subject to font variants, which accounts for
the persistent error.)

In particular, for reasons having to do with canonical forms of Unicode,
decompositions are most unlikely to change henceforth.  So we can
expect some new XML name and name-start characters in Unicode 4.0,
new backwards compatibility hacks are rather unlikely.

> It would be possible to define XML characters as being based directly upon
> the current Unicode data tables, i.e., replace the whole BNF rule 85 table
> with a rule that directly referenced Unicode: "BaseChar ::= [..what Unicode
> says..]".  I realise that this example isn't real BNF, but it is just as
> valid a method of specifying characters.  We could perhaps refer to
> Unicode's BNF rules for the purpose of the XML grammar, but use Unicode
> tables for actual XML implementations.

Tim Bray laid out the perceived risks here pretty nicely.

> If i'm using XML in a real-world environment, it doesn't matter if XML 1.0
> has been changed to allow some new character if i haven't upgraded my
> Unicode support, and vice versa.

Well, not if you are trying to *render* the new characters.  But
an application that translates XML to some legacy format could cope
with new Unicode characters without change.

> This would tighten the bond between XML
> and Unicode, since the latter organization couldn't make their changes
> oblivious to their impact upon XML (no insult intended to Unicode, Inc.).
> Since XML is based upon Unicode, XML developers are also, by definition,
> Unicode developers -- these two communities are already interdependent.

Yes, but overly detailed coupling between independently developed
standards makes for tricky management issues.  Particularly because
the Unicode Standard is developed by both the Unicode Consortium
and ISO JTC1/SC2/WG2, coordinating a third body is probably too tricky.

-- 
John Cowan                                   cowan@ccil.org
       I am a member of a civilization. --David Brin

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

References:
- Re: First draft of proposed XML TC for Unicode 3.0 (unofficial)
  - From: "Nik O" <niko@cmsplatform.com>

Prev by Date: enumeration and defaults
Next by Date: Re: Groves, the next big thing (Re: ANN: XML and Databases article)
Previous by thread: Re: First draft of proposed XML TC for Unicode 3.0 (unofficial)
Next by thread: Re: First draft of proposed XML TC for Unicode 3.0 (unofficial)
Index(es):
- Date
- Thread