[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] XML Schema: "Best used with the ______ tool"
- From: Boris Kolpackov <boris@codesynthesis.com>
- To: Michael Kay <mike@saxonica.com>
- Date: Wed, 3 Dec 2008 12:19:59 +0200
Hi Michael,
Michael Kay <mike@saxonica.com> writes:
> > I agree with Dennis here in that XQuery can be usable when
> > you need to access a small subset of an XML document.
> > However, when one needs to access most of the data, or,
> > worse, access the same data many times, data binding will
> > have speed/memory advantage.
>
> Evidence please! I don't see any reason why it should.
While I would also like to see the results of comparing XSLT to
data binding as agreed between you and Dennis, I came up with a
simple Binding/XQuery benchmark for the above scenario (repetitive
access to most of the data in a document). It compares the speed
of performing the same operation using one implementation of XML
data binding (CodeSynthesis XSD[1]) and one implementation of
XQuery (XQilla[2]). The source code, including the schema and
sample XML document is available here:
http://www.codesynthesis.com/~boris/tmp/dbxq.tar.gz
The benchmark operates on the following XML:
<t:people xmlns:t='http://www.example.com/test'>
<description>People catalog.</description>
<person first="John" last="Doe" age="30" gender="male"/>
<person first="Jane" middle="Q" last="Doe" age="28" gender="female"/>
...
<t:people>
The operation performed selects all the person records from the
catalog that are of a specific gender and younger than a specific
age. The input parameters (gender and max age) are determined at
runtime. The output is first name, last name, and age for each
matching record, expected as native types (string, string, and
unsigned short). If you need a concrete example, think of it as
a GUI or server application where the user or client can specify
gender and max age (in GUI or as a non-XML request) and get the
list of records back (again, in GUI or as a non-XML response).
The benchmark for both data binding and XQuery has the following
overall structure:
// Load XML
Time start;
for (i = 0; i < 100; i++)
{
// Receive input data: gender and max age
// Find matching records
// Send output data for each record: first name, last name, age
}
Time end;
The query used in the XQuery test is as follows:
declare variable $age as xs:integer external;
declare variable $gender as xs:string external;
for $x in t:people/person
where $x/@age < $age and $x/@gender = $gender
return concat($x/@first, ':', $x/@last, ':', $x/@age)
The measurement on a 10000-record XML file (about 800Kb) shows
that the data binding test is about 280 times faster than the
XQuery test (0.09 sec vs 25.38 sec). I want to repeat that this
result is for two particular implementations of XML data binding
and XQuery. The test box is 1.8Ghz Opteron running x86-64 GNU/Linux.
Michael, would you be able to port this test to Saxon? Because of
such a big difference, I think it would be interesting to see the
numbers for Saxon even if the languages are different (C++ and Java).
> > Then there is the whole aspect of interfacing with the rest
> > of the world. Assembling text queries from bits and pieces
> > that come from different sources and then unpacking the
> > results for further processing does not sound like something
> > that is easy to use.
> >
> Sounds a lot easier than coding it in Java.
Here are the relevant fragments from the two tests. First is data
binding:
people& ppl = ...
bool male;
unsigned short max_age;
input (max_age, male);
gender g (male ? gender::male : gender::female);
people::person_sequence& s (ppl.person ());
for (people::person_iterator pi (s.begin()), e (s.end ()); pi != e; ++pi)
{
person& p (*pi);
unsigned short age = p.age ();
if (age < max_age && p.gender () == g)
output (p.first (), p.last (), age);
}
Now XQuery:
const char query_text[] =
"declare variable $age as xs:integer external;\n"
"declare variable $gender as xs:string external;\n"
"for $x in t:people/person\n"
"where $x/@age < $age and $x/@gender = $gender\n"
"return concat($x/@first, ':', $x/@last, ':', $x/@age)\n";
XQQuery* query = ...
DynamicContext* context = ...
bool male;
unsigned short max_age;
input (max_age, male);
// Set external variables.
//
{
ItemFactory* factory (context->getItemFactory ());
Item::Ptr value (factory->createInteger (max_age, context));
context->setExternalVariable (X ("age"), value);
value = factory->createString (X (male ? "male" : "female"), context);
context->setExternalVariable (X ("gender"), value);
}
Result result (query->execute (context));
// Iterate over the result.
//
Item::Ptr item;
while (item = result->next(context))
{
string r (UTF8 (item->asString (context)));
string::size_type c1 = r.find (':');
string::size_type c2 = r.find (':', c1 + 1);
string first (r, 0, c1);
string last (r, c1 + 1, c2 - c1 - 1);
string age_str (r, c2 + 1, r.size () - c2 - 1);
unsigned short age;
istringstream istr (age_str);
istr >> age;
output (first, last, age);
}
Note how in the XQuery case I had to invent a delimiter (':') for
the first, last, and age fields so that I could pass them as a string
from XQuery to the programming language. I then had to manually parse
this string and convert the age field to unsigned short. To me, this
doesn't look easier than data binding at all.
[1] http://codesynthesis.com/products/xsd
[2] http://xqilla.sourceforge.net
Boris
--
Boris Kolpackov, Code Synthesis Tools http://codesynthesis.com/~boris/blog
Open source XML data binding for C++: http://codesynthesis.com/products/xsd
Mobile/embedded validating XML parsing: http://codesynthesis.com/products/xsde
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]