Lists Home |
Date Index |
Sorry to distract everyone from the limerick madness -
Gorille, a Java library for testing XML document parts against the
productions defined in XML 1.0, XML 1.1, or your favorite flavor, has just
reached its third release.
This release adds support for Unicode surrogate pairs, as permitted by XML
1.1. It also fixed a few glitches in representing characters with values
greater than 0xFFFF, as Java's char primitive has no built-in understanding
of such things.
Many thanks to Elliotte Rusty Harold and John Cowan for pointing out both
problems and solutions in this field. (Additional thanks to my parents for
getting me the Unicode Standard 3.0 book for Christmas.)
Surrogate pairs are very tricky critters that seem to me to require
substantially more programming care than any other aspect of Unicode, and I
suspect that developers will be cursing them for a long time to come.
More information, downloads, CVS, etc. are available at:
The testing I've been able to perform so far is pretty crude stuff. If
anyone with more experience in Unicode or better tools for creating test
documents has time to explore this work, I'd greatly appreciate it. As XML
1.0 parsers already perform some of this testing, creating tests that go
outside of those bounds and reach gorille (not just the parser) is tricky.
Also, I'm planning to create a code generator that generates compile-able
rules classes from the XML files for people who are uncomfortable with the
notion of specifying productions in loadable and modifiable XML documents.
Gorille 0.3 is still an alpha version, but it's improving rapidly.
Associate Editor, O'Reilly & Associates