OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: participating communities (was XML Blueberry)


This may not be a good admission to make, but I look around the office and I
see fifteen or so Japanese programmers working with XML, but writing the
documents, tags and all, in (shift-)JIS encoded text and not worrying too
much about the consequences. (We are using appropriate declarations.)

If I understand what Mr. Murata says, we are probably already regularly
using some characters that are supposed to map to code-points in the
extension plane.

If I extrapolate, that makes several thousand Japanese users of XML
(programmers and document writers) that are going to be forced to be very
careful of which characters they use in tags and attribute names, sometimes
being forced to use a character with different meaning because the right one
is not available.

Or we are already developing an incompatible document context for XML.

Concerning using kana instead of kanji, as someone already pointed out, it
is not really acceptable. There are too many homonyms in Japanese. Even with
words like "kuruma" (wheel, vehicle), there is not just one kanji, and the
shade of meaning may be significant. With words like "gi" (righteousness,
duty, job, clothing, false, fake, . . . ) even the context may not tell you
which kanji was intended. (If you know this, I apologize for being

As to the relative value of these characters, and the relative loss if they
are not allowed in tag names or attribute names, I can't make a judgement.
Most of the people who are qualified to make such a judgement are not on
this list, because (contrary to popular myth) English takes significantly
more effort and time for most of them to read. (We're talking about
technically capable people who really would be pressed for the time to
devote a couple of hours a day to keep fluent in English, although this list
would be a useful tool in that regard. They already have precious little
time for their families as it is.)

So I defer to Mr. Murata, and I hope you don't discount his opinions simply
because his mode of expression is not as direct as a typical American's
would be.

Joel Rees
programmer -- rees@mediafusion.co.jp
To be a tree supporting all information,
  giving root to the chaos
    and branches to the trivia,
      information breathing anew --
        This is the aim of Yggdrasill.
============================XML as Best Solution===
Media Fusion Co. ,Ltd.  株式会社メディアフュージョン
Amagasaki  TEL 81-6-6415-2560    FAX 81-6-6415-2556
    Tokyo TEL 81-3-3516-2566   FAX 81-3-3516-2567

Elliotte Rusty Harold further clarified:

> At 1:21 PM -0400 7/10/01, John Cowan wrote:
> >I say, and you have implicitly conceded, that *need* is an
> >inappropriate standard: nobody (or almost nobody) *needs* more than
> >Latin, or indeed ASCII.  It's what people *want* that counts.
> >
> I have conceded no such thing. I suppose ultimately the difference between
want and need is a matter of degree and semantics. I will note that there
are many characters and character sets people have wanted in Unicode, that
the Unicode consortium has rejected because they aren't needed. One example
is Klingon. This was rejected not because it's fictional, but because the
user community had a well-established Latin transliteration for the script
that they were accustomed to using. But so far I haven't even seen proof
that users want Blueberry, much less need it.
> >It may be that people who wanted their native scripts encoded in
> >Unicode (if they hadn't, Unicode surely wouldn't have encoded
> >them) may in fact not want to use those scripts in native-language
> >markup.  But the burden of persuasion for such an extraordinary claim
> >is on the claimant.
> >
> I disagree completely. I think the burden of proof has to be on the
claimant who wants to break the entire existing installed base of XML
software and systems. After all, the mere existence of some advantage to the
change is not sufficient to justify it. This cannot be decided on the basis
of the hypothesized advantages alone. These must be weighed against the very
real disadvantages, and that can only be done if we know how big the likely
user community is and how often they will use these characters in XML names.
Only if the need for these characters is great enough should the costs be
> If there are thousands of Ethiopian or Burmese or Khmer speakers clamoring
to write markup in their native script, then the change would have large
advantages that might be worth the cost. On the other hand, if there are a
few dozen native speakers who think it might be nice to have this, but who
are probably going to use Docbook and XHTML 99% of the time anyway, then I
don't think the advantages are big enough. I honestly don't know where the
truth lies. I do know the costs of the update will be huge. I am not willing
to assume the advantages are so big they outweigh the costs until proven
> --
> +-----------------------+------------------------+-------------------+
> | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
> +-----------------------+------------------------+-------------------+
> |          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
> |              http://www.ibiblio.org/xml/books/bible2/              |
> |   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
> +----------------------------------+---------------------------------+
> |  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
> |  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |
> +----------------------------------+---------------------------------+