Lists Home |
Date Index |
> I am having a hard time accepting the case for mixed content, especially
> based on the arguments I have seen.
One important reason is internationalization.
Japanese, in particular, has too many homophones and variant readings
to make either syllabically-spelled words or ideographically-written
characters completely satisfactory. The common "writing-on-the-hand-
when-ralking" behaviour that strikes foreigners in Japan is evidence
To overcome this, Japanese have adopted a system of annotated writing,
which we can call Ruby (after the 4? point characters.) These allow
ideagraphs (whose meaning may be readable but pronunciation unclear)
to be coupled with their phonetic spelling. Or to allow contractions
to be spelled out, or even little translations of unusual foreign words
or names to be given in the text.
Similar annotations are also used by Taiwanese with the bopomofo
syllabary used for teaching children and with rare ideographs.
One of the promises of XML over 3rd normal form data is therefore
that mixed content provides a way for Japanese people (etc) to use their
traditional Japanese solution (ruby annotations) and overcome the
alphabet-centricism of RBDBS and third normal form.
Some internationalization people even go as far as saying that *all*
text in a schema intended for international use should be mixed
content. I.e. that XML's string type should be the exception, to be
used only when the pattern facet is used to disallow Han ideagraphs.
Obviously, this can freak out RDBMS people. But why should East Asians
settle for text in databases being less comprehensible than text in
free text, in ways that alphabetic scripts are not?