[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Parallel execution: functions, XPath, XQuery
- From: Ken North <email@example.com>
- To: XML Dev List <firstname.lastname@example.org>
- Date: Tue, 29 May 2001 07:48:30 +0200
> A review of the current working documents for XQuery and
> XPath 2.0 requirements raises some questions about building a
> search engine that can be deployed over a parallel processing
... snip ...
> So issue 1 is how to partition document collections for
> optimal parallel execution with XPath and XQuery.
Michael Rys wrote:
<< Due to the side-effect free nature of the FLWR expressions in XQuery
(and the functional semantics in XPath), I see no problem of being able
to parallelize XQuery expressions (assuming that UDFs are written
side-effect free as well).
<< Do you have any specific point in the functionality that makes you
Michael Kay wrote:
<< There is very little published information on XPath implementation
optimization at all, let alone parallel execution. But there is nothing
the language to preclude it - or do you think otherwise?
How data is partioned seems to be an important consideration. Slicing
data so a partition includes complete documents seems to be less of a
problem maintaining context than caching or partitioning by document
If, for example, we split a collection so each partition contains whole
documents, then we don't have problems with context, location paths,
position(). However, we might learn from application profiling or query
profiling that a horizontal split is better for performance. Instead of
partitioning by creating collections of complete documents, we might
find that creating partitions of document fragments delivers better
Now we have a problem of maintaining context so positional operations
are correct; i.e., -- that a reference to a relative or absolute node
path works correctly. If i"m writing a UDF that involves positional
logic, I'm challenged to keep it free of side effects from different
schemes for partitioning a document collection for parallel processing.
It seems I'll need some solution for getting or setting context when
data is partitioned by document fragments.