RE: [xml-dev] XML Schema: "Best used with the _____

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] XML Schema: "Best used with the ______ tool"
From: "Michael Kay" <mike@saxonica.com>
To: "'Boris Kolpackov'" <boris@codesynthesis.com>
Date: Wed, 3 Dec 2008 10:51:01 -0000

Yes, I'll be happy to run this. 25seconds for a simple search of 800Kb
sounds very slow indeed. One question though, it's not clear to me whether
you are measuring the time taken to load the XML, or only the time taken to
query it.

My instinct would be to have the query return a sequence of nodes to the
calling application, and have the calling application extract the attributes
that it wants from those nodes.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Boris Kolpackov [mailto:boris@codesynthesis.com] 
> Sent: 03 December 2008 10:20
> To: Michael Kay
> Cc: 'Dennis Sosnoski'; xml-dev@lists.xml.org
> Subject: Re: [xml-dev] XML Schema: "Best used with the ______ tool"
> 
> Hi Michael,
> 
> Michael Kay <mike@saxonica.com> writes:
> 
> > > I agree with Dennis here in that XQuery can be usable 
> when you need 
> > > to access a small subset of an XML document.
> > > However, when one needs to access most of the data, or, worse, 
> > > access the same data many times, data binding will have 
> speed/memory 
> > > advantage.
> > 
> > Evidence please! I don't see any reason why it should.
> 
> While I would also like to see the results of comparing XSLT 
> to data binding as agreed between you and Dennis, I came up 
> with a simple Binding/XQuery benchmark for the above scenario 
> (repetitive access to most of the data in a document). It 
> compares the speed of performing the same operation using one 
> implementation of XML data binding (CodeSynthesis XSD[1]) and 
> one implementation of XQuery (XQilla[2]). The source code, 
> including the schema and sample XML document is available here:
> 
> http://www.codesynthesis.com/~boris/tmp/dbxq.tar.gz
> 
> The benchmark operates on the following XML:
> 
> <t:people xmlns:t='http://www.example.com/test'>
>   <description>People catalog.</description>
>   <person first="John" last="Doe" age="30" gender="male"/>
>   <person first="Jane" middle="Q" last="Doe" age="28" 
> gender="female"/>
> 
>   ...
> 
> <t:people>
> 
> The operation performed selects all the person records from 
> the catalog that are of a specific gender and younger than a 
> specific age. The input parameters (gender and max age) are 
> determined at runtime. The output is first name, last name, 
> and age for each matching record, expected as native types 
> (string, string, and unsigned short). If you need a concrete 
> example, think of it as a GUI or server application where the 
> user or client can specify gender and max age (in GUI or as a 
> non-XML request) and get the list of records back (again, in 
> GUI or as a non-XML response).
> 
> The benchmark for both data binding and XQuery has the 
> following overall structure:
> 
>  // Load XML
> 
>  Time start;
> 
>  for (i = 0; i < 100; i++)
>  {
>    // Receive input data: gender and max age
> 
>    // Find matching records
> 
>    // Send output data for each record: first name, last name, age  }
>  
>  Time end;
> 
> The query used in the XQuery test is as follows:
> 
> declare variable $age as xs:integer external; declare 
> variable $gender as xs:string external; for $x in 
> t:people/person where $x/@age < $age and $x/@gender = $gender 
> return concat($x/@first, ':', $x/@last, ':', $x/@age)
> 
> The measurement on a 10000-record XML file (about 800Kb) 
> shows that the data binding test is about 280 times faster 
> than the XQuery test (0.09 sec vs 25.38 sec). I want to 
> repeat that this result is for two particular implementations 
> of XML data binding and XQuery. The test box is 1.8Ghz 
> Opteron running x86-64 GNU/Linux.
> 
> Michael, would you be able to port this test to Saxon? 
> Because of such a big difference, I think it would be 
> interesting to see the numbers for Saxon even if the 
> languages are different (C++ and Java).
> 
> 
> > > Then there is the whole aspect of interfacing with the 
> rest of the 
> > > world. Assembling text queries from bits and pieces that 
> come from 
> > > different sources and then unpacking the results for further 
> > > processing does not sound like something that is easy to use.
> > > 
> > Sounds a lot easier than coding it in Java.
> 
> Here are the relevant fragments from the two tests. First is data
> binding:
> 
>     people& ppl = ...
> 
>     bool male;
>     unsigned short max_age;
>     input (max_age, male);
> 
>     gender g (male ? gender::male : gender::female);
> 
>     people::person_sequence& s (ppl.person ());
> 
>     for (people::person_iterator pi (s.begin()), e (s.end 
> ()); pi != e; ++pi)
>     {
>       person& p (*pi);
> 
>       unsigned short age = p.age ();
> 
>       if (age < max_age && p.gender () == g)
>         output (p.first (), p.last (), age);    
>     }
> 
> Now XQuery:
> 
>     const char query_text[] =
>     "declare variable $age as xs:integer external;\n"
>     "declare variable $gender as xs:string external;\n"
>     "for $x in t:people/person\n"
>     "where $x/@age < $age and $x/@gender = $gender\n"
>     "return concat($x/@first, ':', $x/@last, ':', $x/@age)\n";
> 
>     XQQuery* query = ...
>     DynamicContext* context = ...
> 
>     bool male;
>     unsigned short max_age;
>     input (max_age, male);
> 
>     // Set external variables.
>     //
>     {
>       ItemFactory* factory (context->getItemFactory ());
> 
>       Item::Ptr value (factory->createInteger (max_age, context));
>       context->setExternalVariable (X ("age"), value);
> 
>       value = factory->createString (X (male ? "male" : 
> "female"), context);
>       context->setExternalVariable (X ("gender"), value);
>     }
> 
>     Result result (query->execute (context));
> 
>     // Iterate over the result.
>     //
>     Item::Ptr item;
>     while (item = result->next(context))
>     {
>       string r (UTF8 (item->asString (context)));
> 
>       string::size_type c1 = r.find (':');
>       string::size_type c2 = r.find (':', c1 + 1);
> 
>       string first (r, 0, c1);
>       string last (r, c1 + 1, c2 - c1 - 1);
>       string age_str (r, c2 + 1, r.size () - c2 - 1);
> 
>       unsigned short age;
>       istringstream istr (age_str);
>       istr >> age;
> 
>       output (first, last, age);
>     }
> 
> Note how in the XQuery case I had to invent a delimiter (':') 
> for the first, last, and age fields so that I could pass them 
> as a string from XQuery to the programming language. I then 
> had to manually parse this string and convert the age field 
> to unsigned short. To me, this doesn't look easier than data 
> binding at all.
> 
> [1] http://codesynthesis.com/products/xsd
> [2] http://xqilla.sourceforge.net
> 
> Boris
> 
> -- 
> Boris Kolpackov, Code Synthesis Tools   
> http://codesynthesis.com/~boris/blog
> Open source XML data binding for C++:   
> http://codesynthesis.com/products/xsd
> Mobile/embedded validating XML parsing: 
> http://codesynthesis.com/products/xsde
Follow-Ups:
- Re: [xml-dev] XML Schema: "Best used with the ______ tool"
  - From: Boris Kolpackov <boris@codesynthesis.com>
References:
- Re: [xml-dev] XML Schema: "Best used with the ______ tool"
  - From: Boris Kolpackov <boris@codesynthesis.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]