XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Allowed characters for NCName



something is strange as dotless i is U+0131 which does have the Unicode
letter class, and is allowed in XML names.


> The links show the characters in Hexadecimal. Is there anywhere that
> actually displays the list of characters?

It's quite a long list


there's plenty of places where you can look up unicode names of
characters. the default place being

http://www.unicode.org/charts/
or
http://www.unicode.org/ucd/

(for pdf charts or textual tables and documentation respectively)


However I find it's useful to have the information available as XML.

The following XQuery for example returns the Unicode number and name of
all characters in Unicode 3.0 that has Lu or Ll (upper or lower case
letter) character class.

saxon9q -u -s http://www.w3.org/2003/entities/2007xml/unicode.xml
"{//character[unicodedata/@category=('Ll','Lu')][number(description/@unicode)<=3]/(string(@id),string(description),'&#10;')}"

You might want to fetch the file and run it locally, unicode.xml is
5.6Mb in size and the above returns lots of lines, starting
 U00041 LATIN CAPITAL LETTER A 
 U00042 LATIN CAPITAL LETTER B 
 U00043 LATIN CAPITAL LETTER C 
 U00044 LATIN CAPITAL LETTER D 
 U00045 LATIN CAPITAL LETTER E 
 U00046 LATIN CAPITAL LETTER F 
 U00047 LATIN CAPITAL LETTER G 
 U00048 LATIN CAPITAL LETTER H 
 U00049 LATIN CAPITAL LETTER I 
 U0004A LATIN CAPITAL LETTER J 

As explained at
http://www.w3.org/TR/REC-xml/#NT-Letter
other character classes  need to be included ( Ll, Lu, Lo, Lt, Nl. Mc,
Me, Mn, Lm,  Nd.) and some character ranges are excluded, but the
above Xpath could be adjusted (or the character range list as given in
the xml spec could be made into a regexp) but probably the above is
about as long as you'd want to do on the command line rather than
putting the Xquery/Xpath into a file.


David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS