[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Correct xml:lang value for Pinyin Chinese vsSimplified Chinese
- From: John Cowan <cowan@mercury.ccil.org>
- To: Rick Jelliffe <rjelliffe@allette.com.au>
- Date: Mon, 27 Feb 2012 11:55:30 -0500
Rick Jelliffe scripsit:
> Are you sure you have the right terms here? Pinyin is not pidgen.
True.
> And it usually has no accents. (If it has accents, in particular macrons,
> it may not be standard Pinyin, which is not to say that it might not
> be an old or extended Pinyin.)
Standard Hànyǔ pīnyīn (汉语拼音) as used by the PRC, Singapore,
and ROC governments, and standardized as ISO 7098:1982, definitely does
have accents: one for each syllable (except for the toneless syllables),
as shown in this sentence.
> Language codes are in flux: the three letter codes and the two letter
> codes have different approaches.
Three-letter codes are never used for languages that have two-letter codes.
Chinese as a whole has the two-letter code "zh", whereas Mandarin proper
has the three-letter code "cmn". For backward compatibility, "zh-cmn"
also designates Mandarin.
> So first you need to determine the
> region: is your simplified text from PRC or Singapore?
>
> Assuming it is from PRC, then the language code zh-CN should be
> enough AFAIK.
There are texts from the PRC in traditional characters. "Zh-Hans" is
the modern standard form for simplified-character texts whether from the
PRC or Singapore or elsewhere. "Zh-CN" usually means the same thing,
but it is a backward compatibility hack.
> Note that there is (or should be) no need to specify anything about
> the script if you are just marking up existing text. @xml:lang
> specifies the language, and the script only indirectly because a
> language+region often has a standard or characteristic orthography:
> the general script being used is obvious from the characters
> themselves.
You're out of date here. xml:lang definitely can specify script, though
it is not required to.
> So you could use xml:lang="zh-CN" for all the three cases you
> mention. If you wanted to give more of a hint, you could try
> xml:lang="zh-CN-pinyin" or "zh-Latn-CN-pinyin" for the standard
> pinyin, and xml:lang="zh-CN-pinyin-adhoc" or "zh-Latn-CN-adhoc" for
> the non-standard one (where "adhoc" is some phrase you pick to
> indicate an extended pinyin or mystery format.)
I assumed that the OP wanted to have a distinct tag for each case.
If you are going to use something ad hoc, it must take the form "x-adhoc".
--
Even a refrigerator can conform to the XML John Cowan
Infoset, as long as it has a door sticker cowan@ccil.org
saying "No information items inside". http://www.ccil.org/~cowan
--Eve Maler
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]