xml-dev - Re: [xml-dev] XML-enabled databases, XQuery APIs

Re: [xml-dev] XML-enabled databases, XQuery APIs

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] XML-enabled databases, XQuery APIs
From: David Lyon <david.lyon@computergrid.net>
Date: Tue, 19 Apr 2005 19:31:10 +1000
In-reply-to: <001301c544b3$a25a5170$1601a8c0@DURANTE>
References: <200504181843.j3IIhFvI097554@www9.cruzio.com> <42649218.5010404@rpbourret.com> <001301c544b3$a25a5170$1601a8c0@DURANTE>
User-agent: KMail/1.7.1

Hi Ken,

You've found some quite interesting references there.

One point I would like to make is from the python link 
(http://www.oreillynet.com/pub/wlg/6291) where mention
is made to the assumption of parsing 8-bit text documents 
when Unicode docs may be the norm in the future.

As an old assembly language programmer, I can't help
but rub my hands with joy as this seems to be a sort of 32-bit
challenge.

Unicode is (according to my understanding) an 8-bit escaping system. That is 
if the character is extended, it is written into a second, third and then 
consecutive bytes if required.

What a pain to chomp on a 32-bit processor..

So to do really *fast* unicode stuff, ideally, the in-memory view wouldn't 
store the characters in 8-bit, but just as 32-bit (4 byte) or 64-bit (8 byte) 
strings.

Of course, I think this is a few months away. But true 32-bit in memory 
processing of XML might be fun.

Maybe we might be having a year of benchmarks and crunch testing.

btw, if you have 2500 documents to crunch, you can add macines to your lan and 
crunch them on multiple machines. It's not a purely linear process. 

Best regards

David

On Tue, 19 Apr 2005 5:44 pm, Ken North wrote:
> > Michael Kay wrote:
> > > I would be very surprised if XML parsing contributes anything
> > > noticeable to the cost of a database load (in shredding mode).
>
> These benchmarks were run on different testbeds so this isn't an
> apples-to-apples comparison.
>
> This parser performance test of Expat (C, SAX) reported a best time of 0.05
> sec to parse an 884K document with 32K nodes. For 2500 documents, that
> would be approx. 125 seconds.
> http://okmij.org/ftp/Scheme/SSAX-benchmark-1.html
>
> This benchmark compared two SQL APIs. It was written in C and executed in a
> client-server mode, so there was network latency. It used an SQL INSERT,
> not a bulk load.
> http://www.datadirect.com/techres/odbc/docs/wp_odbcvsoci.pdf
>
> The average time to INSERT 2500 rows with ODBC was 23.53 seconds. The
> minimum execution time for an SQL SELECT query to return 2500 rows was 0.05
> sec.
>
> This single-cpu Java benchmark parsed simpler documents than the Expat test
> and the data was closer to the tables in the SQL API test. It took about 2
> seconds to parse 10,000
> records using SAX, or about .5 seconds to parse 2500 records.
> http://www.devsphere.com/xml/benchmark/method.html
>
> This Python benchmark took between 2.32 and 3.97 seconds for a 3.3 MB
> document. http://www.oreillynet.com/pub/wlg/6291
>
> My guess is if these benchmarks were all run on the same testbed under the
> same conditions, we'd see a repeatable pattern:
>
> The parsing overhead for loading a database is negligible if we're
> processing simple documents, but becomes more significant as the documents
> increase in size.
>
>
>
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>

-- 
Computergrid : The ones with the most connections win.

Follow-Ups:
- Re: [xml-dev] XML-enabled databases, XQuery APIs
  - From: David Carlisle <davidc@nag.co.uk>

References:
- Re: [xml-dev] XML-enabled databases, XQuery APIs
  - From: Ronald Bourret <rpbourret@rpbourret.com>
- Re: [xml-dev] XML-enabled databases, XQuery APIs
  - From: "Ken North" <kennorth@sbcglobal.net>

Prev by Date: RE: [xml-dev] XML-enabled databases, XQuery APIs
Next by Date: Re: [xml-dev] XML-enabled databases, XQuery APIs
Previous by thread: Re: [xml-dev] XML-enabled databases, XQuery APIs
Next by thread: Re: [xml-dev] XML-enabled databases, XQuery APIs
Index(es):
- Date
- Thread