Lists Home |
Date Index |
Alex, see comments inline below...
On Nov 22, 2004, at 1:01 PM, Aleksander Slominski wrote:
> Wolfgang Hoschek wrote:
>> This is to announce the nux-1.0beta2 release
>> Nux is a small, straightforward, and surprisingly effective
>> open-source extension of the XOM XML library.
> hi Wolfgang,
> the natural question is: how does it compare to XBIS?
Among other things, we also benchmarked with the test xml files that
come with XBIS (thanks to Dennis Sonoski for the great work - much
appreciated). It would be interesting to directly compare performance
with XBIS, but so far we did not do so, for two reasons:
- XBIS currently does not work with XOM (misses some XMLReader
features/properties that XOM requires)
- XBIS measures performance from and to SAX event streams. bnux measure
performance from XOM documents to byte arrays, and back. bnux includes
XOM tree walking, tree building, and the inherent XOM XML
wellformedness checks, which is signifcantly more epensive (and also
more useful, since it measure delivering data from/to a large number of
real-world applications, rather than low-level SAX apps). In other
words, the benchmarking methodology is different. It would not be an
apples to apples comparison. Might still be interesting, though.
> can it be divorced from XOM?
The concept is applicable to any DOM-like tree model and probably any
infoset based model. The implementation is specific to XOM.
>> Features include:
>> • Seamless W3C XQuery and XPath support for XOM, through
>> • Efficient and flexible pools and factories for XQueries,
>> XSL Transforms, as well as Builders that validate against various
>> schema languages, including W3C XML Schemas, DTDs, RELAX NG,
>> Schematron, etc.
>> • Serialization and deserialization of XOM XML documents to
>> and from an efficient and compact custom binary XML data format
>> (bnux format), without loss or change of any information.
>> • For simple and complex continuous queries and/or
>> transformations over very large or infinitely long XML input, a
>> convenient streaming path filter API combines full XQuery support
>> with straightforward filtering.
>> • Glue for integration with JAXB and for queries over
>> ill-formed HTML.
>> • Well documented API. Ships in a jar file that weighs just
>> 60 KB.
>> XOM serialization and deserialization performance is more than good
>> enough for most purposes. However, for particularly stringent
>> performance requirements this release adds "bnux", an option for
>> lightning-fast binary XML serialization and deserialization.
> did you compare BNUX and XBIS performance?
>> Contrasting bnux with XOM:
>> • Serialization speedup: 2-7 (10-35 MB/s vs. 5 MB/s)
>> • Deserialization speedup: 4-10 (20-50 MB/s vs. 5 MB/s)
>> • XML data compression factor: 1.5 - 4
>> For a detailed discussion and background see
> XOM is tree model so how do you do streaming - it by streaming partial
> XOM tree construction/deconstruction when you access data (overriding
> |endElement()| in |NodeFactory|) and manually keep detach-ing() nodes
> or just letting them to be GCed?
Currently we do not do streaming.
The bnux serialization algorithm is a three-pass batch algorithm, hence
buffer-oriented, not stream-oriented. It has a throughput profile with
short critical paths, rather than a low latency profile with long
critical paths, rendering it ideal for large volumes of small to
medium-sized XML documents, and impractical for individual documents
that do not fit into main memory. The bnux deserialization algorithm
is a single pass algorithm, and could in theory be streamed through a
NodeFactory, but the current implementation does not do so.
The serialization algorithm could be restructured to be a single pass
algorithm at the expense of compression; performance would probably be
roughly the same. Turning the single pass algorithm into a chunked
streaming algorithm using "pages" would be possible but complicated,
probably reducing performance. We have not tried it, tough.
> what are use cases for nux: what do you plan to use it for?
The algorithm is primarily intended for tightly coupled
high-performance systems exchanging large volumes of XML data over
networks, as well as for compact main memory caches and for short-term
storage as BLOBs in backend databases or files (e.g. "session" data
with limited duration).
> are use cases related to XML Binary Characterization
They might fit into that "diverse" bag-of-things as well...
> i am a bit disappointed that scientific requirements are completely
> omitted form XBC use cases - the closest i could find is
> http://www.w3.org/TR/xbc-use-cases/#FPenergy but it skips over whole
> issue how to transfer array of doubles without changing endianess ...
I may be wrong, but conversion of doubles to strings and back seems the
main CPU drain here, rather than byte swapping. Try doing this for
billions of floats, gulp. Hence one would need to ship arrays of
doubles in IEEE floating point representation or native format to avoid
string conversions, perhaps most appropriately as an "attachment"
according to the various related standards out there. When working with
a binary representation, one could also extend DOM-like APIs in
somewhat counter-intuitive manners, with subclasses like
DoubleArrayText, converting from double to IEEE floating point and
back, or similar.
> we did lot of work in past related to XML performance (in Indiana
> University and Binghamton) and are very concerned that whatever binary
> XML will be characterized/standardized in W3C will be of no much use
> for scientific computing and grids ...
You would need strong advocates/evangelists, it seems.