OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Blueberry/Unicode/XML

James Clark wrote:
> > 1. Leave it the way it is.
> > 2. Do Blueberry and then repeat the process for Unicode 3.2
> >    and 4.0 and so on every couple of years forever.
> > 3. Bite the bullet, write the rules in terms of Unicode
> >    metadata and go to a pure use-by-reference architecture,
> >    probably adding a syntactic signal to reference the
> >    Unicode version number.
> I don't find any of these options very appealing.
> Another bullet one could bite is to no longer make checking of name
> characters (beyond what is needed to prevent ambiguity) a part of
> well-formedness.  Whilst it's nice to have some sanity checking of names,
> using inappropriate characters in names doesn't cause problems for further
> processing layers to the same extent as other things that are part of
> well-formedness do, such as unbalanced tags or duplicate attributes.
> At least I think one should consider easing draconian error handling for
> name characters to reduce deployment problems with option 2.

This sounds like an elegant and simple proposal.  It satisfies Tim's desire
to be able to eventually say XML is "complete" and it might actually result
in more efficient parsers if it removes the need for an XML character table.

However, I presume there was a good reason why the current name character
scheme was implemented.  The reasons I can think of are easily dismissed or
dealt with.  Are there any other more serious implications?

- Obviously certain mark-up characters must be excluded from names: [ " '
( ) < > & 0x9 0xA 0x20 ...]
- Characters appearing as whitespace could be confusing
- er any others?