Lists Home |
Date Index |
Gorille  is a very simple Java library for testing XML content and
labels against lists of allowable Unicode characters like those provided in
XML 1.0  and XML 1.1 . Gorille is available under the Mozilla Public
Gorille uses an XML format to specify lists of characters according to
either XML 1.0 conventions (with its BaseChar, Ideographic, CombiningChar,
Digit, and Extender productions) or XML 1.1 conventions (NameStartChar,
NameChar). Both forms permit specification of the Char and S production
for content characters and whitespace. I've included sample lists for both
XML 1.0 and XML 1.1, as well as an ASCII-only version of XML 1.0.
Gorille performs checking of Name, Names, NMTOKEN, and NMTOKENS, as well as
character checking for any of the productions listed above. This checking
is performed by XML parsers as documents are parsed, but Gorille may be
useful for checking XML documents generated by programs or to restrict
documents to subsets of the characters allowed by XML. Gorille relies
completely on Java's built-in support for Unicode strings and characters,
though it doesn't use any of the Unicode property information Java provides.
Gorille does provide for some rather perverse modifications of the
productions - you could, for instance, require that all content be in
control characters while all names be ideographic - but my hope is that
developers will use it in reasonable ways which don't create arbitrary
explosions as programs reject bad information.
I'll be using Gorille to provide name- and content-checking for MOE ,
but hope to also create a SAXFilter which uses it and perhaps a Java
FilterReader for preprocessing content before it reaches a parser.
Gorille is currently in alpha. I believe the basic functionality is
complete, but there's still potential for improvement, expansion, and as
always, better documentation. (Including RDDL documents for the character
list and test files!)
Associate Editor, O'Reilly & Associates