I am pleased to announce that version 0.5 of
VTD-XML -- a new,
non-extractive, Java-base XML processing API licensed under
-- is now freely available on sourceforge.net. For source
documentation, detailed description of API and code examples,
Capable of random-access, VTD-XML attempts
to be both memory
efficient and high performance. The starting point of this
the observation that, for XML documents that don't declare
in DTD, tokenization can indeed be done by only recording the
offset and length of a token.
The core technology of VTD-XML is a binary
called Virtual Token Descriptor (VTD). A VTD record is a
that encodes the starting offset, length, type and nesting
depth of a
token in an XML document. Because VTD records don't contain
token content, they work alongside of the original XML document,
is maintained intact in memory by the processing model.
VTD's memory-conserving features can be
summarized as follows:
* Avoid Per-object overhead -- In many
programming languages, per-object
allocation incurs a small amount
of memory overhead. A VTD
record is immune to the overhead because
it is not an
* Bulk-allocation of storage -- Fixed in length, VTD records
stored in large memory blocks, which are more
efficient to allocate
and GC. By allocating a large array
for 4096 VTD records, one incurs
the per-array overhead
(16 bytes in JDK 1.4) only once across 4096
reducing per-record overhead to very little.
Our benchmark indicates that VTD-XML
processes XML at the performance
level similar to (and often better than) SAX
with NULL content handler.
The memory usage is typically between 1.3x ~ 1.6x
of the size of the
document, with "1" being the document itself.
Other features included in this release
* Incremental update -- VTD-XML allows one
to modify content of XML
without touching irrelevant parts
of the document.
* Content extraction -- VTD-XML also allows one to
pull an element
out of XML in its serialized format. This
can be an important
feature for partial signing/encryption
of SOAP payload for
In the upcoming releases, we plan to add the
persistence support so
that one can save/load VTD to/from the disk along with
the XML documents
to avoid repetitive parsing in read-only situations. XPATH
also on the development roadmap. However, we would like to collect
many suggestions and bug reports before taking the next step.
Your input and suggestions are very
important to make VTD-XML a truly
useful XML processor.
All the best,