[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Allowed characters for NCName
- From: David Carlisle <davidc@nag.co.uk>
- To: desmond.kirrane@googlemail.com
- Date: Thu, 13 Dec 2007 12:52:01 GMT
something is strange as dotless i is U+0131 which does have the Unicode
letter class, and is allowed in XML names.
> The links show the characters in Hexadecimal. Is there anywhere that
> actually displays the list of characters?
It's quite a long list
there's plenty of places where you can look up unicode names of
characters. the default place being
http://www.unicode.org/charts/
or
http://www.unicode.org/ucd/
(for pdf charts or textual tables and documentation respectively)
However I find it's useful to have the information available as XML.
The following XQuery for example returns the Unicode number and name of
all characters in Unicode 3.0 that has Lu or Ll (upper or lower case
letter) character class.
saxon9q -u -s http://www.w3.org/2003/entities/2007xml/unicode.xml
"{//character[unicodedata/@category=('Ll','Lu')][number(description/@unicode)<=3]/(string(@id),string(description),' ')}"
You might want to fetch the file and run it locally, unicode.xml is
5.6Mb in size and the above returns lots of lines, starting
U00041 LATIN CAPITAL LETTER A
U00042 LATIN CAPITAL LETTER B
U00043 LATIN CAPITAL LETTER C
U00044 LATIN CAPITAL LETTER D
U00045 LATIN CAPITAL LETTER E
U00046 LATIN CAPITAL LETTER F
U00047 LATIN CAPITAL LETTER G
U00048 LATIN CAPITAL LETTER H
U00049 LATIN CAPITAL LETTER I
U0004A LATIN CAPITAL LETTER J
As explained at
http://www.w3.org/TR/REC-xml/#NT-Letter
other character classes need to be included ( Ll, Lu, Lo, Lt, Nl. Mc,
Me, Mn, Lm, Nd.) and some character ranges are excluded, but the
above Xpath could be adjusted (or the character range list as given in
the xml spec could be made into a regexp) but probably the above is
about as long as you'd want to do on the command line rather than
putting the Xquery/Xpath into a file.
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]