OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] XML Performance in a Transacation

[ Lists Home | Date Index | Thread Index ]

--- Michael Champion <michael.champion@hotmail.com>
wrote:

> > Date: Wed, 22 Mar 2006 16:19:56 -0500> From:
> d_a_carver@yahoo.com> To: xml-dev@lists.xml.org>
> Subject: [xml-dev] XML Performance in a
> Transacation> > I've been requested to provide some
> numbers to show that actual XML > validation results
> and parsing are a small portion of the overall >
> transaction process, when dealing with XML in a B2B
> process.  Any > information that can be provided
> would be appreciated.
>
> See
>
http://lists.w3.org/Archives/Public/www-ws/2004Oct/att-0032/MNicola_CIKM_2003_1_.pdf
> "XML Parsing - A Threat to Database Performance." 
> Be forewarned that the conclusion may be
> unpalatable:
>  
> "We reported real-world experiences of using XML
> with databases
> where XML parsing was the main performance
> bottleneck. This
> motivated an analysis of the cost of SAX parsing and
> DTD &
> XML schema validation. We find that parsing even
> small XML
> documents without validation can increase the CPU
> cost of a relational
> database transaction by 2 to 3 times or more.
> Parsing with
> schema validation and without grammar caching can
> increase
> transaction cost by 10 times or more. This is a
> serious problem for
> high performance transaction oriented database
> applications which
> intend to use XML"

As everyone else has said, it all depends on your
usage. But I personally think above comment is not
closely tied to current reality. I did a quick test on
my development system (see below for details); and
_raw_ parsing speed (java streaming parser that scans
through the whole doc, just counting stats for lengths
etc) were as follows:

* 43 MBps for big xml export/import files (1 MB, no
namespaces, but namespace aware parser)
  [== file parsed 1093 times during 30 seconds, from
disk ~= 30 milliseconds to parse]
* 37 MBps for big StarOffice xml content file (500 kB,
fully namespaced, lots of attributes)
  [2182 reads over 30 seconds ~= 15 milliseconds]
* 8 MBps for a small SOAP request (718 bytes)
  [322,000 times over 30 seconds ~= 0.1 milliseconds];
  the lower throughput is probably due to constant
  overhead of instantiating the parser instance.

XML content was read from a file, although in practice
Linux caches repeated disk access so it's equivalent
to from memory parsing (meaning i/o should not matter
a lot). System is plain old 3Ghz single-CPU intel
linux work station, with reasonably fast scsi disk.
Test was single-threaded, with 10 second warm up
period for the parser (parsing the same file as during
the test).

For comparison, simple scanning of file from Java is
less than 50% faster than xml parsing.

For my purposes, at least, xml parsing itself is not
the most significant performance overhead: it's all
xml processing above and beyond parsing. But I just
parse XML content and use it; no validation (DTD
validation seems to add 50% overhead for me [-> 35%
lower throughput], if DTD caching is enabled... just
as one data point)

Your mileage may vary. Specifically, if you have to
use in-memory document model (DOM etc), prepare to
reduce throughput by half an order of magnitude
(compared to simple streaming use case).

-+ Tatu +-


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS