OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Parallel execution: functions, XPath, XQuery



> A review of the current working documents for XQuery and
> XPath 2.0 requirements raises some questions about building a
> search engine that can be deployed over a parallel processing
> architecture.
... snip ...
> So issue 1 is how to partition document collections for
> optimal parallel execution with XPath and XQuery.


Michael Rys wrote:
<< Due to the side-effect free nature of the FLWR expressions in XQuery
(and the functional semantics in XPath), I see no problem of being able
to parallelize XQuery expressions (assuming that UDFs are written
side-effect free as well).

<< Do you have any specific point in the functionality that makes you
believe otherwise?

Michael Kay wrote:
<< There is very little published information on XPath implementation
and
optimization at all, let alone parallel execution. But there is nothing
in
the language to preclude it - or do you think otherwise?

How data is partioned seems to be an important consideration. Slicing
data so a partition includes complete documents seems to be less of a
problem maintaining context than caching or partitioning by document
fragments.

If, for example, we split a collection so each partition contains whole
documents, then we don't have problems with context, location paths,
position(). However, we might learn from application profiling or query
profiling that a horizontal split is better for performance. Instead of
partitioning by creating collections of complete documents, we might
find that creating partitions of document fragments delivers better
performance.

Now we have a problem of maintaining context so positional operations
are correct; i.e.,  -- that a reference to a relative or absolute node
path works correctly. If i"m writing a UDF that involves positional
logic, I'm challenged to keep it free of side effects from different
schemes for partitioning a document collection for parallel processing.
It seems I'll need some solution for getting or setting context when
data is partitioned by document fragments.