Re: [xml-dev] Re: Discover data patterns or Create data patterns?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Dimitre Novatchev" <dnovatchev@gmail.com>
To: "Michael Kay" <mike@saxonica.com>
Date: Mon, 22 Sep 2008 20:51:41 -0700

"Michael Kay" <mike@saxonica.com> wrote in message
8D80A281FA3E4ED5B92DC5AFF2BA16F8@Sealion">news:8D80A281FA3E4ED5B92DC5AFF2BA16F8@Sealion...
>> A good example of a problem that can be completely
>> eliminated: use a Finger-Tree-based sequence for all
>> sequences, then there is no need to worry to convert from one
>> sequence type to another (of course, the items still need to
>> be of the same type). Not to mention the gains in efficiency.
>>
>
> Well, perhaps it was reading your blog that triggered me to use this
> example. But I don't believe it's generally the case that for any given
> interface, there is one implementation that will be optimum (or even "good
> enough") across all usage patterns.

I am sorry I couldn't find free time to reply to a similar comment on
my blog, hope now I am replying to it, too.

I share your opinion. Even if a finger-tree-based collection is good
enough in 80% of the cases, then it would be a useful strategy to have
this implementation as the basic (default) one at the start and then
replace it with something else only in cases where this is proven
necessary.

I believe that in most cases the finger tree will be satisfactory
enough. Plus the added benefit of not having to maintain and think of
different collection implementations. Probably in many applications
we'll not need to substitute the finger tree with something else. Only
in special cases such need will be felt and justified.

>
> On the specific question of the fingertree and its possible use as an
> implementation of XPath 2.0 sequences in memory, I'm a little sceptical.
> Firstly, XSLT/XPath implementations try to avoid storing sequences in memory
> wherever possible, relying instead on lazy evaluation and pipelining.

The finger-tree data structure allows a lazy implementation, too. In
fact, I haven't done this in a completely lazy way: just to leave some
more fun for the future and more seriously, because I did doubt a
completely lazy implementation would result in more significant
performance gains. One way to add more laziness is not to construct
the "innerFT" finger tree until some future operation doesn't require
this. Because of the recursive definition of this structure, laziness
is possible at any level of recursion.

> Secondly, where operations are performed on sequences, the original input
> usually comes from a path expression - that is, an iterator over nodes in an
> XML tree. In your performance measurements, to show the superiority of this
> data structure, you had to suppress Saxon's optimizations by writing
> obfuscatory code to defeat the optimizer. I would be more interested if you
> could show that real programs benefit even when you allow the optimizer (and
> lazy evaluation / pipelining) to do its work.

That will be the case whenever there are repeated append, prepend,
concat, insert, delete and split operations on a sequence. Without an
optimized data structure the complexity of such operations is usually
O(N^2). With a finger tree the complexity is either O(1) (for prepend
and append), or sublinear.

I will be interested to perform such an experiment with real XSLT
applications (the two Sudoku solvers and the FXSL LR parsing system
naturally come to mind) whenever I find some free time.

An ordered collection implemented on a finger tree can be naturally
used in XPath operations -- it will sort and dedublicate nodes without
the need for any additional code.

>
> You also apply the data structure to storage of strings. For string
> operations (such as substring and concat), Saxon isn't currently using lazy
> evaluation or pipelining; but it could, and my instinct is that this could
> match the benefits you are seeing for the kind of application you are using
> as a test case.


This would be interesting to see. The string tests were one of the
very few in which Saxon wasn't the best of the three tested XSLT
processors. Even the one that obviously used a directly addressable
char array was significantly slower than the finger-tree-based string.

Of course, the performance gains can be demonstrated and can be useful
on collections/strings of moderate or big length; a collection of a
few items does not need any optimization.

>
> (Reading this through, it sounds too dismissive. I'm not rejecting the idea,
> I'm just expressing initial scepticism).
>
> Michael Kay

Any false enthusiasm would be really harmful, your interest and
analysis are truly valuable.


Thanks,
Dimitre Novatchev


>
>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>

References:
- Re: Discover data patterns or Create data patterns?
  - From: "Dimitre Novatchev" <dnovatchev@gmail.com>
- Re: [xml-dev] Re: Discover data patterns or Create data patterns?
  - From: Rick Jelliffe <rjelliffe@allette.com.au>
- Re: [xml-dev] Re: Discover data patterns or Create data patterns?
  - From: "Dimitre Novatchev" <dnovatchev@gmail.com>
- RE: [xml-dev] Re: Discover data patterns or Create data patterns?
  - From: "Michael Kay" <mike@saxonica.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]