OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Reinventing the wheel (an implementor point of view)



Hi everybody,

a few lines of introduction first.

I am a PhD student at the University of Pennsylvania (Database
Research Group) and I am -- among other interesting cool XML-related
projects -- the implementor of Kweelt, the first and only (as far as I
know) implementation of the Quilt proposal, now XQuery.

Given this, if you are looking for someone who has actually reinvented
the wheel, that's me. I don't pretend I have the truth (maybe there is
no truth), but at least I can claim to have some ideas about the
topic.

What I find good in XSL-T?
==========================
- people are actually using it
- this is really great to transform trees
- a very interesting aspect based on the rule-based/template paradigm
  is that you can navigate the entire tree and apply transformation
  depending on the context

What I find not that good in XSL-T?
====================================
- given a set of rules, it is often complicated to understand what's
  going on, because of implicit rules, etc.
- also, I am not aware of any optimization for it
As a simple exercice, try to write an XSL-T transformation to transpose
a matrix and see how infficient this can be.
- how to you debug XSL-T queries
- performance

What I find good in XQuery?
========================
- XQuery inherits from SQL (or rather something called
  comprehensions). The advantage is that it is easy to understand a
  query because all the pieces are together (inlike XSL-T rules).
- XQuery offers a programming model (for queries) which is similar to
  SQL, where you create bindings, filter them with conditions and then
  produce the result.

What I find not that good in XQuery?
====================================
- actually I don't see how a language could be wrong. This is just
  syntax after all. From my implementation perspective, mixing HTML
  and arithmetic operators makes the parsing unnecessarily complex.
- in terms of semantics, the Quilt proposal was relying on XPath and
  it is not clear (to me) how order is handled.
For more details, see http://db.cis.upenn.edu/DL/kweelt-TR.pdf
- XQuery wants to be a very expressive language and it is not clear to
me how you can combine:
  . composition of queries with the ability to output any XML you want
  . optimization with recursive function
  . optimization with FILTER

I think the main issue is not about languages. XQuery and XSL-T are
just surface languages for people to describe what they want. What is
more important is what there is underneath, to make the
evaluation/compilation of these languages eficient. The W3C XML
algebra unfortuenately seems to be more a specication tool rather than
an algebra used for optimization (in the sense of relational algebra
for relational databases).

My experience implementing Quilt/XQuery is that mapping queries to DOM
is easy within the limitations of the DOM implementation (document
size). The DOM API should be extended to look more a like an XPath API,
to make calls more efficient. Also supporting an iterator model would be
great. But what about larger documents?

I have also implemented a SQL backend, where Quilt queries (actually
XPath bindings) are mapped into SQL. If you do it in a naive way, you
are dead, but if you restrict yourselves to a subset of XPath, this
can be done.

The main issue (and it is actually a big research issue) is to
find/guess what the low-level operators for the XML data models
are. For relational databases, we already know them and this is what
makes DBMS fast. The holy grail should not be the language, but these
low-level operators (XScan, from Univ. of Washington, is a first
attempt. See http://data.cs.washington.edu/xml/xscan.pdf).

For more info, you can read a tech report about the implementation of
Kweelt: http://db.cis.upenn.edu/DL/kweelt-TR.pdf
The Kweelt code can be found at: http://db.cis.upenn.edu/Kweelt/

I will also give a talk at XML Dev Con 2001 in NYC in April.

best regards,

Arnaud Sahuguet