OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] XML-enabled databases, XQuery APIs

[ Lists Home | Date Index | Thread Index ]

Hi Ken,

You've found some quite interesting references there.

One point I would like to make is from the python link 
(http://www.oreillynet.com/pub/wlg/6291) where mention
is made to the assumption of parsing 8-bit text documents 
when Unicode docs may be the norm in the future.

As an old assembly language programmer, I can't help
but rub my hands with joy as this seems to be a sort of 32-bit
challenge.

Unicode is (according to my understanding) an 8-bit escaping system. That is 
if the character is extended, it is written into a second, third and then 
consecutive bytes if required.

What a pain to chomp on a 32-bit processor..

So to do really *fast* unicode stuff, ideally, the in-memory view wouldn't 
store the characters in 8-bit, but just as 32-bit (4 byte) or 64-bit (8 byte) 
strings.

Of course, I think this is a few months away. But true 32-bit in memory 
processing of XML might be fun.

Maybe we might be having a year of benchmarks and crunch testing.

btw, if you have 2500 documents to crunch, you can add macines to your lan and 
crunch them on multiple machines. It's not a purely linear process. 

Best regards

David

On Tue, 19 Apr 2005 5:44 pm, Ken North wrote:
> > Michael Kay wrote:
> > > I would be very surprised if XML parsing contributes anything
> > > noticeable to the cost of a database load (in shredding mode).
>
> These benchmarks were run on different testbeds so this isn't an
> apples-to-apples comparison.
>
> This parser performance test of Expat (C, SAX) reported a best time of 0.05
> sec to parse an 884K document with 32K nodes. For 2500 documents, that
> would be approx. 125 seconds.
> http://okmij.org/ftp/Scheme/SSAX-benchmark-1.html
>
> This benchmark compared two SQL APIs. It was written in C and executed in a
> client-server mode, so there was network latency. It used an SQL INSERT,
> not a bulk load.
> http://www.datadirect.com/techres/odbc/docs/wp_odbcvsoci.pdf
>
> The average time to INSERT 2500 rows with ODBC was 23.53 seconds. The
> minimum execution time for an SQL SELECT query to return 2500 rows was 0.05
> sec.
>
> This single-cpu Java benchmark parsed simpler documents than the Expat test
> and the data was closer to the tables in the SQL API test. It took about 2
> seconds to parse 10,000
> records using SAX, or about .5 seconds to parse 2500 records.
> http://www.devsphere.com/xml/benchmark/method.html
>
> This Python benchmark took between 2.32 and 3.97 seconds for a 3.3 MB
> document. http://www.oreillynet.com/pub/wlg/6291
>
> My guess is if these benchmarks were all run on the same testbed under the
> same conditions, we'd see a repeatable pattern:
>
> The parsing overhead for loading a database is negligible if we're
> processing simple documents, but becomes more significant as the documents
> increase in size.
>
>
>
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>

-- 
Computergrid : The ones with the most connections win.




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS