XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] Whither JDOM?

Thank you for your prompt response.  I'm quite sure that JDOM is
signalling the error, having looked at the code, the documentation, and
the mailing list.

Here's the documentation for the Verifier method that the
org.jdom.Element constructor uses:
  "This is a utility function for determining whether a specified
character is a letter 
   according to production 84 of the XML 1.0 specification." 
 
http://www.jdom.org/docs/apidocs/org/jdom/Verifier.html#isXMLLetter(char
) 

JDOM presentation material cites it as a feature over other interfaces,
which don't keep you from producing bad documents.

The source code for org.jdom.Verifier bears the documentation out: 

  public static boolean isXMLLetter(char c) {
        // Note that order is very important here.  The search proceeds 
        // from lowest to highest values, so that no searching occurs 
        // above the character's value.  BTW, the first line is
equivalent to:
        // if (c >= 0x0041 && c <= 0x005A) return true;

        if (c < 0x0041) return false;  if (c <= 0x005a) return true;
        if (c < 0x0061) return false;  if (c <= 0x007A) return true;
        if (c < 0x00C0) return false;  if (c <= 0x00D6) return true;
        if (c < 0x00D8) return false;  if (c <= 0x00F6) return true;
        if (c < 0x00F8) return false;  if (c <= 0x00FF) return true;
        if (c < 0x0100) return false;  if (c <= 0x0131) return true;
   		... on for a couple of hundred more lines ...

It appears that JDOM hasn't been updated for XML 1.0 Fifth Edition.

In this case, the cited production [84] in XML 1.0 is now deprecated and
unused within the REC.  The new productions [4] and [5] are simpler and
more inclusive.  

It was when sending a message about this issue that I noticed that the
JDOM archives haven't been gotten any new mail since January, etc.

My own message (attached) didn't show up either, and I've been a list
member for a long time.

Leigh.

> 
> -----Original Message-----
> From: Michael Kay [mailto:mike@saxonica.com] 
> Sent: Thursday, March 19, 2009 4:24 PM
> To: Klotz, Leigh; xml-dev@lists.xml.org
> Subject: RE: [xml-dev] Whither JDOM?
> 
> I'd be very surprised if JDOM cares what it finds in an element name:
I suspect that if the XML parser accepts XML 1.05 style names, then JDOM
will accept them too. But I might be wrong. Do you have any evidence to
the contrary?
> 
> Michael Kay
> http://www.saxonica.com/ 
> 
> > -----Original Message-----
> > From: Klotz, Leigh [mailto:Leigh.Klotz@xerox.com]
> > Sent: 19 March 2009 23:16
> > To: xml-dev@lists.xml.org
> > Subject: [xml-dev] Whither JDOM?
> > 
> > I sent a note to the jdom-interest list about JDOM and XML 1.0 (5th)

> > production rules for element names, and noticed it didn't make it to

> > the archives.  Then I noticed nothing's new in the archives since 
> > January.
> > A few quick web searches turned up no tea-leaf reading, and I didn't

> > see any discussion here either.
> > 
> > So, anybody have ideas what's going on, and if it's nothing good,
for 
> > the prospects of getting JDOM updated?
> > 
> > Leigh.
--- Begin Message ---
JDOM 1.1 won't create elements whose characters are in the following
ranges:
  Unicode 0xFF41-0xFF5A (FULLWIDTH LATIN SMALL LETTER A to FULLWIDTH
LATIN SMALL LETTER Z)
  Unicode 0xFF21-0xFF3A (FULLWIDTH LATIN CAPITAL LETTER A to FULLWIDTH
LATIN CAPITAL LETTER Z)

The JDOM 1.1 source for org.jdom.Verifier.isXMLLetter cites production
84 of the XML 1.0 Recommendation for its table of allowed characters. 

However, according to http://www.w3.org/TR/REC-xml/ the whole of
Appendix B (which contains Production 84) is obsolete and is not used
within the recommendation.  The XML Rec instead uses production [4] for
NameStartChar and [5] for NameChar.  

The productions at [4] and [5] are considerably smaller than those of
Appendix B, and are more inclusive, providing for greater utility in
I18N applications of XML.

Furthermore, according to http://www.w3.org/TR/REC-xml/ Appendix J
(Non-Normative), the characters I menition above are not only allowed,
but encouraged for use in XML Names, because the Unicode ID_Start
property and ID_Continue of these Unicode code points is True.  

The XML REC says:

    1. The first character of any name should have a Unicode property of
ID_Start, or else be '_' #x5F.
    2. Characters other than the first should have a Unicode property of
ID_Continue, or ...

You can see that ID_Start and ID_Continue are True on the individual
pages for the small letters here:
http://unicode.org/cldr/utility/character.jsp?a=FF41
to
http://unicode.org/cldr/utility/character.jsp?a=FF5A

I recommend that org.jdom.Verifier.isXMLLetter be updated to use
production [4], [4a], and [5] of XML 1.0 Fifth Edition.
It's quite likely that some of the other character class verifiers need
updating as well, but I didn't examine them.

Leigh.
--- End Message ---


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS