OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Well-formed Blueberry



Elliotte,

I am having a real problem understanding why you want to put a wall between
docs requiring blueberry and docs not requiring blueberry. I must be taking
a point of view on version progression that is too naive. The following does
give me a hint:

> further propose that the encoding declaration must be explicit.
> That is, this is malformed even though the default character
> set is UTF-8:

> <?xml version="1.1"?>

However, I am having a hard time figuring out why the standard should treat
authors of nonstandard XML documents better than people who simply want to
use their own language in markup.

In a corollary point of confusion for me, you seem to assume in your posts
that, even without your wall, a blueberry capable parser must have both the
pre-blueberry character classification tables and the blueberry character
classification tables. In my naive point of view, a document that is valid
XML 1.0 ought to be valid blueberry, thus, the complete table should be the
only necessary table, unless you want to build a wall.

So I must have missed some posts on this subject. Can you direct me to them?

One more thing, to which (single) Japanese character set do you refer below?

Joel Rees
programmer -- rees@mediafusion.co.jp
----------------------------------------------------
To be a tree supporting all information,
  giving root to the chaos
    and branches to the trivia,
      information breathing anew --
        This is the aim of Yggdrasill.
============================XML as Best Solution===
Media Fusion Co. ,Ltd.  株式会社メディアフュージョン
Amagasaki  TEL 81-6-6415-2560    FAX 81-6-6415-2556
    Tokyo TEL 81-3-3516-2566   FAX 81-3-3516-2567
                       http://www.mediafusion.co.jp
===================================================

---------------------------------------------------
Original message:

From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
To: xml-dev@lists.xml.org, www-xml-blueberry-comments@w3.org
Date: Sun, 15 Jul 2001 11:28:08 -0400

----------------------------------------------------------------------------
----

Some more thoughts on requirements for well-formed Blueberry documents:

If the need for streaming documents makes it seem too problematic that only
documents that use Blueberry name characters be allowed to carry a Blueberry
declaration, then I propose a weaker alternative:

Only documents whose encoding declaration explicitly declares a character
set which can include Blueberry characters is allowed to have a Blueberry
declaration. e.g.

<?xml version="1.1" encoding="ISO-8859-1"?>

would be malformed. However, these would be well-formed:

<?xml version="1.1" encoding="UTF-8"?>
<?xml version="1.1" encoding="UTF-16"?>
<?xml version="1.1" encoding="UCS-4"?>

(I'm just using version="1.1" here to make my point. The details are not
affected by what the Blueberry declaration eventually looks like.)

I further propose that the encoding declaration must be explicit. That is,
this is malformed even though the default character set is UTF-8:

<?xml version="1.1"?>

My logic is that many authors just write this when what they really mean is
encoding="US-ASCII". I do not think requiring Blueberry documents to
explicitly specify UTF-8 is an onerous burden. Note that this does not
change the default character set for  <?xml version="1.1"?> which would
still be UTF-8.

There are not that many encodings that can handle the Blueberry characters,
basically just several variants of Unicode, one Japanese character set, and
possibly a couple of Chinese character sets. Most of the scripts that are at
issue here (Amharic, Khmer, Burmese, etc.) have never had a standard
encoding prior to Unicode. Indeed that is exactly the reason it took until
Unicode 3.0 to decide how to encode them. It was not possible to simply
transpose an existing national character set. There have been numerous
proposals for alternative encodings of Unicode lately, but all of them have
been shot down with extreme hostility by the Unicode consortium. Thus I do
not think it would be a huge problem to enumerate all the encodings anybody
is likely to want for Blueberry characters. Certainly, any new encodings
that do arise in the future should be round-tripabble to standard Unicode
encodings.
--

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |
+----------------------------------+---------------------------------+