Lists Home |
Date Index |
You've found some quite interesting references there.
One point I would like to make is from the python link
(http://www.oreillynet.com/pub/wlg/6291) where mention
is made to the assumption of parsing 8-bit text documents
when Unicode docs may be the norm in the future.
As an old assembly language programmer, I can't help
but rub my hands with joy as this seems to be a sort of 32-bit
Unicode is (according to my understanding) an 8-bit escaping system. That is
if the character is extended, it is written into a second, third and then
consecutive bytes if required.
What a pain to chomp on a 32-bit processor..
So to do really *fast* unicode stuff, ideally, the in-memory view wouldn't
store the characters in 8-bit, but just as 32-bit (4 byte) or 64-bit (8 byte)
Of course, I think this is a few months away. But true 32-bit in memory
processing of XML might be fun.
Maybe we might be having a year of benchmarks and crunch testing.
btw, if you have 2500 documents to crunch, you can add macines to your lan and
crunch them on multiple machines. It's not a purely linear process.
On Tue, 19 Apr 2005 5:44 pm, Ken North wrote:
> > Michael Kay wrote:
> > > I would be very surprised if XML parsing contributes anything
> > > noticeable to the cost of a database load (in shredding mode).
> These benchmarks were run on different testbeds so this isn't an
> apples-to-apples comparison.
> This parser performance test of Expat (C, SAX) reported a best time of 0.05
> sec to parse an 884K document with 32K nodes. For 2500 documents, that
> would be approx. 125 seconds.
> This benchmark compared two SQL APIs. It was written in C and executed in a
> client-server mode, so there was network latency. It used an SQL INSERT,
> not a bulk load.
> The average time to INSERT 2500 rows with ODBC was 23.53 seconds. The
> minimum execution time for an SQL SELECT query to return 2500 rows was 0.05
> This single-cpu Java benchmark parsed simpler documents than the Expat test
> and the data was closer to the tables in the SQL API test. It took about 2
> seconds to parse 10,000
> records using SAX, or about .5 seconds to parse 2500 records.
> This Python benchmark took between 2.32 and 3.97 seconds for a 3.3 MB
> document. http://www.oreillynet.com/pub/wlg/6291
> My guess is if these benchmarks were all run on the same testbed under the
> same conditions, we'd see a repeatable pattern:
> The parsing overhead for loading a database is negligible if we're
> processing simple documents, but becomes more significant as the documents
> increase in size.
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
Computergrid : The ones with the most connections win.