OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Blueberry

Tim Bray scripsit:

> This Blueberry issue is not a slam-dunk either way, it's
> a genuinely hard issue.  


> I'd argue pretty strongly for signaling 
> Blueberry documents with some value of <?xml version="X" ?> 
> where X is not "1.0".

There are arguments, which I won't detail here yet, in favor
of using some other mark for Blueberry documents (such as a
"unicode" pseudo-attribute in the XML declaration).  In any
event, something that will cause XML 1.0 parsers to barf

> And the cost of this is very high.  At the moment, XML 1.0
> is pretty effectively one thing and just one thing, and if
> I claim to ship out XML and you claim to be able to read it,
> we can usually interoperate, especially since we're both
> probably using expat or xerces or msxml.  Introducing 
> Blueberry will impair this admirable simplicity.

It will.  But because Blueberry is a very limited change,
adding support for it should be a very small effort, not that
different from correcting a bug (and all these parsers have
had bugs at one time or another that affected interop).

> A subsidiary issue, by the way:  If you add NEL to the 
> set of line-end characters there are a bunch of other 
> Unicode space and space-like characters that, to be fair,
> you're going to have to consider adding to the production
> for "S". 

I think there's good reason not to add the various Unicode
space characters.  But adding Unicode LINE SEPARATOR along
with NEL makes good sense to me, and is allowed for.
(Disclaimer: The Core WG might add NEL and not LS,
or vice versa, or both.)

> Then another potential problem: if you decide to push XML
> past version 1.0, why not take the opportunity to pour 
> in namespaces?  And fix the white-space handling?  And...
> well, probably nobody is willing to step off this cliff,
> so maybe I'm raising a red herring.

Exactly.  Blueberry is a *very small* upgrade to XML 1.0.
Really!  Just add a few tables!  Almost nothing to it!
You can get it done in one day!  Etc. etc. etc.

The last thing the Core WG wants is to be deluged with
requests for worthy improvements to XML.  Hence the
meta-requirement to change nothing that we don't need
to change to achieve our stated goals.

> There's a coherent but rather ad-hoc
> (and non-normative) explanation of the choice of XML name 
> characters in one of the appendices.  I believe John Cowan
> suggested that he has a better algorithm/heuristic?  Please
> share. Having a cleaner simpler relationship between XML and 
> Unicode is arguably a good thing.  

Well, yes, but there is IMHO (the Core WG may not agree) a need
for backward compatibility: you should be able to take a random
1.0 document, mark it Blueberry, and it should still work.
Consequently, I don't want to confuse the issue by posting my
whole proposed algorithm, especially since it is not in ANY WAY
approved by the Core WG at present.  But here's an outline of it:


Name starts are upper-case letters (Unicode general category Lu),
lower-case letters (Ll), caseless letters (Lo), modifier letters (Lm),
and numeric letters (Nl -- a handful of oddities), plus ":" and "_".

Name non-start characters are non-spacing combining marks (Mn),
Indic vowel signs (Mc), numeric digits (Nd), and connector
punctuation (Pc), plus "-", ".", and MIDDLE DOT.

Global exclusions are: compatibility characters, characters in
the compatibility areas (F900-FFFD, 2F800-2FFFD), and the new
musical-symbol combining characters (1D165-1D1AD).

There are 33 exceptions to the exclusions: 12 ideographic characters
that are in the compatibility area but aren't compatibility characters,
and 21 characters that don't really belong but are needed for
full backward compatibility with 1.0.

> So I think it would be appropriate, in this discussion,
> to have some people in the mainframe trenches give us
> a briefing on the scale and the difficulty of the problems
> they face, and for some of our i18n gurus to highlight
> the problems faced by an XML language designer who wants
> to use one of the newly-added languages.

I second this.

John Cowan                                   cowan@ccil.org
One art/there is/no less/no more/All things/to do/with sparks/galore
	--Douglas Hofstadter