OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   XML Core WG needs input on xml:lang=""

[ Lists Home | Date Index | Thread Index ]

The W3C XML Core WG has decided to allow the value of xml:lang, the
attribute for indicating the natural language of character data, to
be an empty string in order to allow the explicit expression of
language-less text inside language-marked text.  Here's an example:

<p lang="en">
  Here is an example of some C code:
  <pre xml:lang="">
     #include "stdio.h"
     main() {printf("Hello world!"};}
  </pre>
</p>

By the present rules, there is no way to express the fact that the
content of the pre element is not in English.  (Computer languages are out
of scope for RFC 3066 and have no codes.)

However, the WG is divided on the question of whether to issue an
erratum to XML 1.0 or to make this provision part of XML 1.1.

Argument for XML 1.1:  It is a new feature and as such belongs in XML 1.1,
which we are conveniently issuing shortly anyway.

Argument for erratum:  It is just a single new allowed value for an attribute
that already got a whole lot of new values when we upgraded (by existing
erratum E11) from the obsolete RFC 1766 to the current RFC 3066.
For example, "haw" was an illegal tag under 1766, but refers to the
Hawai'ian language now.

Note: The XML Schema Datatypes document still references the obsolete RFC,
but defers to XML 1.0 2e for the exact rules, so an erratum would immediately
allow the empty string in objects of type xsd:language; an XML 1.1
change would not immediately allow it.

Note: Any application that processes xml:lang has to already be prepared
for thousands of legal values, most of which it will not understand.
For example, de-jp is legal, symbolizing the variety of German spoken and
written in Japan, whatever that might be.

Note: The existing code "und" is not synonymous with the proposed use of the
empty string.  The "und" code means that the text is in some natural language,
but we don't know which one; the empty string means that the text is not
in a natural language.

Disclosure: I personally favor issuing an erratum.

Please send comments on the question "erratum vs. XML 1.1" to
xml-editor@w3.org, which is also copied on this mail.

-- 
John Cowan
        jcowan@reutershealth.com
                I am a member of a civilization. --David Brin




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS