xml-dev - RE: [xml-dev] XML Performance in a Transacation

RE: [xml-dev] XML Performance in a Transacation

[ Lists Home | Date Index | Thread Index ]

To: Michael Champion <michael.champion@hotmail.com>, d_a_carver@yahoo.com, xml-dev@lists.xml.org
Subject: RE: [xml-dev] XML Performance in a Transacation
From: Tatu Saloranta <cowtowncoder@yahoo.com>
Date: Wed, 22 Mar 2006 15:07:45 -0800 (PST)
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=G/4OYsbHvFbK/BKGKcDH2XMfzCxsAzqKw6ry8EDMC7Y36gIfh5e+IbUJ7SiGKrquTNDk9OX5LEJWN9FEVvPxqmS0Byo5wpYGSHo3l/hoLFRVxbbTewwiQ8I4rsp6RnaQ+uOAOa6phONxiGrUj4VhKxvc7qqaC7ABG1R+knGjY+c= ;
In-reply-to: <BAY114-W1EBB1B3A0DB6B09329E9E99D90@phx.gbl>

--- Michael Champion <michael.champion@hotmail.com>
wrote:

> > Date: Wed, 22 Mar 2006 16:19:56 -0500> From:
> d_a_carver@yahoo.com> To: xml-dev@lists.xml.org>
> Subject: [xml-dev] XML Performance in a
> Transacation> > I've been requested to provide some
> numbers to show that actual XML > validation results
> and parsing are a small portion of the overall >
> transaction process, when dealing with XML in a B2B
> process.  Any > information that can be provided
> would be appreciated.
>
> See
>
http://lists.w3.org/Archives/Public/www-ws/2004Oct/att-0032/MNicola_CIKM_2003_1_.pdf
> "XML Parsing - A Threat to Database Performance." 
> Be forewarned that the conclusion may be
> unpalatable:
>  
> "We reported real-world experiences of using XML
> with databases
> where XML parsing was the main performance
> bottleneck. This
> motivated an analysis of the cost of SAX parsing and
> DTD &
> XML schema validation. We find that parsing even
> small XML
> documents without validation can increase the CPU
> cost of a relational
> database transaction by 2 to 3 times or more.
> Parsing with
> schema validation and without grammar caching can
> increase
> transaction cost by 10 times or more. This is a
> serious problem for
> high performance transaction oriented database
> applications which
> intend to use XML"

As everyone else has said, it all depends on your
usage. But I personally think above comment is not
closely tied to current reality. I did a quick test on
my development system (see below for details); and
_raw_ parsing speed (java streaming parser that scans
through the whole doc, just counting stats for lengths
etc) were as follows:

* 43 MBps for big xml export/import files (1 MB, no
namespaces, but namespace aware parser)
  [== file parsed 1093 times during 30 seconds, from
disk ~= 30 milliseconds to parse]
* 37 MBps for big StarOffice xml content file (500 kB,
fully namespaced, lots of attributes)
  [2182 reads over 30 seconds ~= 15 milliseconds]
* 8 MBps for a small SOAP request (718 bytes)
  [322,000 times over 30 seconds ~= 0.1 milliseconds];
  the lower throughput is probably due to constant
  overhead of instantiating the parser instance.

XML content was read from a file, although in practice
Linux caches repeated disk access so it's equivalent
to from memory parsing (meaning i/o should not matter
a lot). System is plain old 3Ghz single-CPU intel
linux work station, with reasonably fast scsi disk.
Test was single-threaded, with 10 second warm up
period for the parser (parsing the same file as during
the test).

For comparison, simple scanning of file from Java is
less than 50% faster than xml parsing.

For my purposes, at least, xml parsing itself is not
the most significant performance overhead: it's all
xml processing above and beyond parsing. But I just
parse XML content and use it; no validation (DTD
validation seems to add 50% overhead for me [-> 35%
lower throughput], if DTD caching is enabled... just
as one data point)

Your mileage may vary. Specifically, if you have to
use in-memory document model (DOM etc), prepare to
reduce throughput by half an order of magnitude
(compared to simple streaming use case).

-+ Tatu +-

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

References:
- RE: [xml-dev] XML Performance in a Transacation
  - From: "Michael Champion" <michael.champion@hotmail.com>

Prev by Date: Re: [xml-dev] XML Performance in a Transacation
Next by Date: RE: [xml-dev] XML Performance in a Transacation
Previous by thread: RE: [xml-dev] XML Performance in a Transacation
Next by thread: RE: [xml-dev] XML Performance in a Transacation
Index(es):
- Date
- Thread