OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Correct xml:lang value for Pinyin Chinese vs Simplified Chinese

Are you sure you have the right terms here?  Pinyin is not pidgen. And
it usually has no accents. (If it has accents, in particular macrons,
it may not be standard Pinyin, which is not to say that it might not
be an old or extended Pinyin.)

Language codes are in flux: the three letter codes and the two letter
codes have different approaches. The two letter codes plus regional
variant may still be safest.  So first you need to determine the
region: is your simplified text from PRC or Singapore?

Assuming it is from PRC, then  the language code  zh-CN should be enough AFAIK.

Note that there is (or should be) no need to specify anything about
the script if you are just marking up existing text. @xml:lang
specifies the language, and the script only indirectly because a
language+region often has a standard or characteristic orthography:
the general script being used is obvious from the characters

So you could use  xml:lang="zh-CN"  for all the three cases you
mention. If you wanted to give more of a hint, you could try
xml:lang="zh-CN-pinyin" or  "zh-Latn-CN-pinyin"  for the standard
pinyin,  and  xml:lang="zh-CN-pinyin-adhoc" or "zh-Latn-CN-adhoc" for
the non-standard one (where "adhoc" is some phrase you pick to
indicate an extended pinyin or mystery format.)

(I suspect the transliterated Chinese with accented roman characters
would not be a legitimate  zh-Latn-CN  (I'd expect John Cowen to be on
top of this) but if it were, then that would probably be the best for
the non-standard transliteration )

If you want to mark up your text so that screen readers can read it,
then find the website for the screen reader, contact the developers,
and ask them. I doubt if the non-standard pinyin would have specialist
readers that can understand it in any case (though IIRC there was a
reader that understood 1,2,3,... tone digits in with pinyin or

For more info, see
You could track down the current IANA registrations for
http://www.ietf.org/rfc/rfc4646.txt too, I guess.

Rick Jelliffe

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS