[
Lists Home |
Date Index |
Thread Index
]
I'm happy to report that I've finally published an alpha version of Markup
Object Events (MOE) [1]. MOE is a Mozilla-licensed Java API and supporting
set of classes which supports markup processing using both events and
object trees. MOE programs can work purely with events, purely with trees,
or with combinations of both, effectively providing a "middle way" between
SAX and DOM.
MOE emerged from my work over the past summer on SAX filters, notably
Regular Fragmentations [2]. I couldn't justify creating DOM trees for the
relatively tiny changes I've been making to documents, but SAX's streaming
approach made it difficult to do things like modify the attributes of an
element based on its contents. I spent a lot of effort building temporary
containers, and concluded that the "temporary" containers were interesting
in their own right.
MOE permits all nodes to have:
* A three-part namespace-aware name (prefix, local name, URI - QName available)
* Unordered content (a set) - think attributes
* Ordered content (a list) - think child elements, text, etc.
* Annotations (a map) - any other information you need, largely unconstrained
MOE's foundation is very abstract (and defined as interfaces for another
level of useful abstraction), but can be readily applied to XML document
processing. The abstraction and interface approach make it possible to use
MOE to represent non-XML content, to preserve lexical information of all
sorts, to represent content which is not well-formed, and to create
annotated object models which host information which may not have any
lexical representation. This alpha release focuses more simply on an
Infoset-like view of XML - something like the SAX2 view of XML.
Developers can use MOE as storage for information from SAX events, or they
can create complete trees built from MOE events. Nodes can listen to flows
of information, building a tree structure, and report when they have
"finished" - when an element reaches its end tag, for instance. Those nodes
can then be reported again as a stream of MOE (or SAX) events, converting
the tree back into events.
While it is possible to use MOE in place of SAX or DOM (given a parser), it
was designed to complement those approaches, not replace them. The current
API supports only tree-walking navigation, not XPath or similar
conveniences. For small trees it's fine, but for large trees it won't be
much fun.
MOE is still pretty raw. It's had a little bit of review from a few
people, and has benefited greatly from that, but I suspect there's a lot
more review to come. I feel reasonably comfortable with the logic of the
core (hence the alpha relase), but the visitor, adapter, and factory
classes are still prone to wild gyrations. Namespace support - especially
declaration management - has also proven trickier than expected, though
that's not a huge surprise.
There is also a very simple Swing application - MOEWorkshop - which lets
you explore MOE trees visually, but it still has a very long way to go to
become the debugging tool for chains of MOE processing that I want.
MOE owes a great deal in spirit to various work done by the Python and Perl
communities. While I work primarily in Java, hearing about the various
tools for working with partial trees in Python and Perl inspired much of
this work.
MOE is deliberately written in what I call "naive Java". Given that I
expect MOE to change substantially over time, I've opted to focus on
clarity rather than performance. I've attempted to create a class
structure which separates particular kinds of processing from the basic
object model, but I've undoubtedly made some slips as well. I'd also like
to develop a comprehensive set of unit tests, but haven't yet had time to
do so. (The tests I have are very simple.)
I'll be rewriting many of my SAX filters to use MOE over the next few
months, starting with Regular Fragmentations. I'm hoping that applying MOE
to a wide variety of general problems will help me evolve it into a more
powerful toolkit over time. Comments, suggestions, queries, and
contributions are all welcome. CVS and mailing lists are available through
the MOE SourceForge project page [3].
[1] - http://moe.sourceforge.net/
[2] - http://simonstl.com/projects/fragment/
[3] - http://sourceforge.net/projects/moe/
Simon St.Laurent
Associate Editor, O'Reilly & Associates
http://simonstl.com
|