[
Lists Home |
Date Index |
Thread Index
]
All this discussion of character entities and references reminded me that
these issues are genuinely aggravating, to say the least. Aggravating
issues tend to drive me to write code, so I did.
The result is a small package I'm calling Ents. Ents is short for
entities, in that this does less than entities, but the Tolkien connection
is interesting as well.
Ents, like most of my filters, takes a list of rules in an XML format. In
this case, the rules define equivalences between character references and
entity names. The processor then uses a Java FilterReader to process a
document, either replacing entity with references (to send to computers who
don't care about entities) or references with names (more typically for
humans). The processing is reversible.
Ents comes with a rules file built from the entity declarations in
Modularization of XHTML. I left in the descriptions of the characters, but
they do nothing in processing. Ents rules can easily be embedded in other
documents (schemas or whatever), and the rules loader will ignore all
content but its own rules. These rules may also be compiled into Java,
using a code-generating class that is part of the package.
I may be adding support for one-way transformation of general entities into
text, as well as a SAXFilter which processes the skippedEntity event using
these rules, but neither of those options is presently available.
Ents 0.2 is available under the Mozilla Public License 1.1. More details
and code are available at:
http://simonstl.com/projects/ents/
As always, comments and suggestions are welcome.
Simon St.Laurent
Associate Editor, O'Reilly & Associates
http://simonstl.com
|