OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Correct xml:lang value for Pinyin Chinese vsSimplified Chinese

Rick Jelliffe scripsit:

> Are you sure you have the right terms here?  Pinyin is not pidgen. 


> And it usually has no accents. (If it has accents, in particular macrons,
> it may not be standard Pinyin, which is not to say that it might not
> be an old or extended Pinyin.)

Standard Hànyǔ pīnyīn (汉语拼音) as used by the PRC, Singapore,
and ROC governments, and standardized as ISO 7098:1982, definitely does
have accents: one for each syllable (except for the toneless syllables),
as shown in this sentence.

> Language codes are in flux: the three letter codes and the two letter
> codes have different approaches. 

Three-letter codes are never used for languages that have two-letter codes.
Chinese as a whole has the two-letter code "zh", whereas Mandarin proper
has the three-letter code "cmn".  For backward compatibility, "zh-cmn"
also designates Mandarin.

> So first you need to determine the
> region: is your simplified text from PRC or Singapore?
> Assuming it is from PRC, then  the language code  zh-CN should be
> enough AFAIK.

There are texts from the PRC in traditional characters.  "Zh-Hans" is
the modern standard form for simplified-character texts whether from the
PRC or Singapore or elsewhere.  "Zh-CN" usually means the same thing,
but it is a backward compatibility hack.

> Note that there is (or should be) no need to specify anything about
> the script if you are just marking up existing text. @xml:lang
> specifies the language, and the script only indirectly because a
> language+region often has a standard or characteristic orthography:
> the general script being used is obvious from the characters
> themselves.

You're out of date here.  xml:lang definitely can specify script, though
it is not required to.

> So you could use  xml:lang="zh-CN"  for all the three cases you
> mention. If you wanted to give more of a hint, you could try
> xml:lang="zh-CN-pinyin" or  "zh-Latn-CN-pinyin"  for the standard
> pinyin,  and  xml:lang="zh-CN-pinyin-adhoc" or "zh-Latn-CN-adhoc" for
> the non-standard one (where "adhoc" is some phrase you pick to
> indicate an extended pinyin or mystery format.)

I assumed that the OP wanted to have a distinct tag for each case.
If you are going to use something ad hoc, it must take the form "x-adhoc".

Even a refrigerator can conform to the XML      John Cowan
Infoset, as long as it has a door sticker       cowan@ccil.org
saying "No information items inside".           http://www.ccil.org/~cowan
        --Eve Maler

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS