OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] XML Performance in a Transacation

[ Lists Home | Date Index | Thread Index ]


Michael Champion said:

> See
http://lists.w3.org/Archives/Public/www-ws/2004Oct/att-0032/MNicola_CIKM_2003_1_.pdf
> "XML Parsing - A Threat to Database Performance."  Be forewarned that the
> conclusion may be unpalatable:

By rights, it seems that there should be some market for a highly
optimized XML parser. You need high performance, you seek high performance
libraries; if there are none, you get them made internally or externally.
But I don't recall ever having seen any requests on XML-DEV for high speed
parsers: certainly none with any dollars behind them.

If some companies get together and say "We will pay $$$ for a higher
performance XML parser" they would get one. A $10,000 first prize and
$5,000 second prize for the winning parser on specified data, schema and
platform would be enough stimulate a lot of hackers and researchers, not
to mention prompting people with inhouse, private parsers to oen source
them.  When you move to an Open Source software economy, the issue for
business becomes "How do we stimulate development in areas that help
us?"

Only this week I was listening to people from a client airline who had to
write their own XML parser in PLI for optimized access to mainframe DB2.
The lack of such a parser suggests to me that organizations using
mainframe/transaction/high-volume databases need to adopt a new,
pro-active stance in getting high performance, open source XML software
written. Passivity in this area will assure they only have unsuitable
implementations.

If you look at, say, Apache Xerces and Xalan, you can see that
hyper-efficiency plays little part of the game. The same is true, by and
large, for the other open source software. Hyper-efficient design is not
an optimization that can be tacked on after, it has to be the core of the
design; you cannot expect a general-purpose, cross-platform parser to be
optimal. (For example, one trick that goes as far back as OmniMark's
predecessor in the late 80s (I believe) was for parsers to have two
parsers:
one optimized for the most common case and encoding--in XML this would be
for an entity-less document--, and another to handle
all the other cases.)

My expectation is that XML parsing can be significantly sped up with
better use of SSE intrinsics*, integrating parsing and transcoding, also
validation and type assignment using streaming path-matching rather than
automata (i.e. transform horizontal grammars into vertical paths), direct
parsing to native data types for numbers, for example. I am sure many
other people have a shopping list of good ideas: but there are no parsers
that implement any of these things AFAIK at the moment. Parser innovation
has stalled, and it surely should be an issue of serious concern (and by
serious concern I mean $$$) to high-volume companies to get it restarted.

The other aspect is that there is no "type aware SAX" API. Without this,
Open Source or even proprietary versus public parsers are not
interchangeable. Obviously this applies to Java most, but the principle is
the same: we need agreements at the interfaces (a.k.a. standards).

Cheers
Rick Jelliffe

* See http://www.oreillynet.com/digitalmedia/blog/2005/11/  and search for
Intrinsics. The OReilly blog site is being altered, it is a complete mess
at the moment, so sorry about the odd format for this archive.





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS