OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   ANN: Gorille (alpha)

[ Lists Home | Date Index | Thread Index ]

Gorille [1] is a very simple Java library for testing XML content and 
labels against lists of allowable Unicode characters like those provided in 
XML 1.0 [2] and XML 1.1 [2].  Gorille is available under the Mozilla Public 

Gorille uses an XML format to specify lists of characters according to 
either XML 1.0 conventions (with its BaseChar, Ideographic, CombiningChar, 
Digit, and Extender productions) or XML 1.1 conventions (NameStartChar, 
NameChar).  Both forms permit specification of the Char and S production 
for content characters and whitespace.  I've included sample lists for both 
XML 1.0 and XML 1.1, as well as an ASCII-only version of XML 1.0.

Gorille performs checking of Name, Names, NMTOKEN, and NMTOKENS, as well as 
character checking for any of the productions listed above.  This checking 
is performed by XML parsers as documents are parsed, but Gorille may be 
useful for checking XML documents generated by programs or to restrict 
documents to subsets of the characters allowed by XML.  Gorille relies 
completely on Java's built-in support for Unicode strings and characters, 
though it doesn't use any of the Unicode property information Java provides.

Gorille does provide for some rather perverse modifications of the 
productions - you could, for instance, require that all content be in 
control characters while all names be ideographic - but my hope is that 
developers will use it in reasonable ways which don't create arbitrary 
explosions as programs reject bad information.

I'll be using Gorille to provide name- and content-checking for MOE [4], 
but hope to also create a SAXFilter which uses it and perhaps a Java 
FilterReader for preprocessing content before it reaches a parser.

Gorille is currently in alpha. I believe the basic functionality is 
complete, but there's still potential for improvement, expansion, and as 
always, better documentation.  (Including RDDL documents for the character 
list and test files!)

[1] http://simonstl.com/projects/gorille
[2] http://www.w3.org/TR/REC-xml
[3] http://www.w3.org/TR/xml11/
[4] http://moe.sourceforge.net

Simon St.Laurent
Associate Editor, O'Reilly & Associates


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS