[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Blueberry/Unicode/XML

From: Rick Jelliffe <ricko@allette.com.au>
To: xml-dev@lists.xml.org
Date: Tue, 10 Jul 2001 21:00:43 +0800

 > However, I presume there was a good reason why the current name character
> scheme was implemented.  The reasons I can think of are easily dismissed
or
> dealt with.  Are there any other more serious implications?

Yes.

0) To catch markup errors.  If someone is using a symbol, they must think
they are in content, marked section, attribute value, comment or a PI. This
way, the error will be detected closer to its source.

1) Catch transcoding problems. Japanese experts report this is good.

2) Promote readable (and read-aloud-able) markup. Symbols are not
appropriate for markup characters. (If you want to use symbols for markup
characters, use ISO SGML's short references or Simon St.Laurent's Regular
Fragmentations.)

3) Minimize the chances of delimiter clashing in languages (hence no ' in
names even if it would be useful for French [' the ASCII character that is])

4) Prevent the use of symbols which also have similar glyphs that are
letters. For example, the maths x or the maths alpha are symbols, and they
have distinct codes from the letters.  If you used the maths x for whatever
reason, the document might be visually correct on inspection, but be coded
wrong: your name comparisons would not work.

5) There is every chance that people will only use their native scripts when
the document is used regionally or locally. Prudence dictates it. A Cyrillic
user can assume that other readers of the document have Cyrillic alphabet
fonts, but she cannot assume which symbols may be used.

6) There are several function characters and characters which are not
suitable for markup: for example the new language-tagging characters from
plane 17 and
perhaps the BIDI characters even. (See Unicode TR 20.)  If we keep them out,
we prevent people trying to do strange things; people will always insist
that there is no intent behind a technology and that they can get by just on
the mechanism, so the mechanism needs to hard-code the intent.

7) Because it is not really very expensive to implement. But just allowing
any surrogate without nitpicking is fine.  But I agree with James Clark that
perhaps name checking (and normalization) should be some kind of different
layer to WF ultimately.   (I note that James has been thinking about this
issue almost as long as anyone:  he implemented native-language markup
capabilities in SP early on, probably the first person to release a
markup-based application that treated other scripts with equity.)

Cheers
Rick Jelliffe

Follow-Ups:
- Re: Blueberry/Unicode/XML
  - From: John Cowan <cowan@mercury.ccil.org>
- Layering (was RE: Blueberry/Unicode/XML)
  - From: Leigh Dodds <ldodds@ingenta.com>

References:
- Presumption of XML's Stability (was RE: XML Blueberry (non-ASCII namecharacters in Japan))
  - From: Mike.Champion@SoftwareAG-USA.com
- Blueberry/Unicode/XML
  - From: Tim Bray <tbray@textuality.com>
- Re: Blueberry/Unicode/XML
  - From: James Clark <jjc@jclark.com>
- Re: Blueberry/Unicode/XML
  - From: Rob Lugt <roblugt@elcel.com>

Prev by Date: Re: Blueberry/Unicode/XML
Next by Date: No Subject
Previous by thread: Re: Blueberry/Unicode/XML
Next by thread: Re: Blueberry/Unicode/XML
Index(es):
- Date
- Thread