xml-dev - Re: Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN

Re: Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
From: "Dimitre Novatchev" <dnovatchev@yahoo.com>
Date: Wed, 29 Dec 2004 12:50:22 +1100
References: <830178CE7378FC40BC6F1DDADCFDD1D10276723C@RED-MSG-31.redmond.corp.microsoft.com> <30291DBF-590E-11D9-A33A-000393DC762C@mac.com> <cqsvbp$fbi$1@sea.gmane.org>
Sender: news <news@sea.gmane.org>

I wrote:

> When one writes:
>
>    f:foldl-tree($f:add, $f:add(), 0, /*)

Must be:

When one writes:

f:foldl-tree(f:add(), f:add(), 0, /*)


Dimitre.


"Dimitre Novatchev" <dnovatchev@yahoo.com> wrote in message 
cqsvbp$fbi$1@sea.gmane.org">news:cqsvbp$fbi$1@sea.gmane.org...
> Why I think Daniela Florescu is right?
>
> Please, bear with my style, which has nothing to do with SAX and any kind 
> of APIs mentioned in this thread. Just read on, I promise you'll agree 
> that my message is relevant.
>
> This is the code of the f:foldl-tree() function, which is part of the FXSL 
> library:
>
> <xsl:stylesheet version="2.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
> xmlns:f="http://fxsl.sf.net/";
> xmlns:int="http://fxsl.sf.net/int/folfl-tree";
> exclude-result-prefixes="f int"
>>
>    <xsl:import href="func-apply.xsl"/>
>
>    <xsl:function name="f:foldl-tree">
>      <xsl:param name="pFuncNode" as="element()"/>
>      <xsl:param name="pFuncSubtrees" as="element()"/>
>      <xsl:param name="pA0"/>
>      <xsl:param name="pNode" as="element()"/>
>
>      <xsl:choose>
>         <xsl:when test="not($pNode)">
>            <xsl:copy-of select="$pA0"/>
>         </xsl:when>
>         <xsl:otherwise>
>            <xsl:variable name="vSubtrees" select="$pNode/*"/>
>
>            <xsl:sequence select=
>             "f:apply($pFuncNode,
>                      $pNode/@tree-nodeLabel,
>                      int:foldl-tree_($pFuncNode, $pFuncSubtrees, $pA0,
>                                      $vSubtrees)
>                      )"
>            />
>         </xsl:otherwise>
>      </xsl:choose>
>    </xsl:function>
>
>    <xsl:function name="int:foldl-tree_">
>      <xsl:param name="pFuncNode" as="element()"/>
>      <xsl:param name="pFuncSubtrees" as="element()"/>
>      <xsl:param name="pA0"/>
>      <xsl:param name="pSubTrees" as="element()*"/>
>
>      <xsl:sequence select=
>       "if(empty($pSubTrees))
>         then $pA0
>         else
>           f:apply($pFuncSubtrees,
>               f:foldl-tree($pFuncNode, $pFuncSubtrees, $pA0, 
> $pSubTrees[1]),
>               int:foldl-tree_($pFuncNode, $pFuncSubtrees, $pA0, 
> $pSubTrees[position() > 1])
>                   )"
>       />
>    </xsl:function>
> </xsl:stylesheet>
>
> In a few words, this is a generic fold() but over a tree (not just over a 
> list). As such, it needs two functions to be provided as parameters -- one 
> for processing the current node and one for processing all subtrees of the 
> current node.
>
> When one writes:
>
>    f:foldl-tree($f:add, $f:add(), 0, /*)
>
> the result of evaluating this is the sum of the values of all 
> @tree-nodeLabel attributes of all nodes in the tree.
>
> If I pass as parameters other functions, I'll perform other processing on 
> a (any!) tree.
>
> So, in case of XSLT/XQuery processing, we pass the necessary two functions 
> as parameters to f:foldl-tree() and we have implemented an XSLT/XQuery 
> processor.
>
> Why is this all relevant to the current discussion?
>
> Because: a fold() processing of any kind is essentially streaming.
>
> Therefore, let.s just provide the required two functions and not worry how 
> the function engine does streaming -- there could be reasonably efficient 
> implementations. The most obvious example is a lazy implementation -- no 
> subtrees are ever processed unless ultimately required.
>
> What is more, in a lazy implementation the source tree can itself be 
> evaluated lazily -- only those nodes/subtrees will need to be parsed, 
> which are ultimately required.
>
> Just as a side note -- streaming a tree implies linearization -- this may 
> go against efficiency when opposed to parallelization (e.g. using a DVC 
> (divide and conquer) approach), which is the ultimate strength of 
> functional languages and will start to matter more and more as explained 
> in the paper "The Free Lunch Is Over: A Fundamental Turn Toward 
> Concurrency in Software" 
> (http://www.gotw.ca/publications/concurrency-ddj.htm) by Herb Sutter.
>
> Parallelization may require that different threads share the same data, 
> which will delay the possibility to discard this data from memory.
>
>
> Cheers,
>
> Dimitre Novatchev.
>
>
> "Daniela Florescu" <dflorescu@mac.com> wrote in message 
> 30291DBF-590E-11D9-A33A-000393DC762C@mac.com">news:30291DBF-590E-11D9-A33A-000393DC762C@mac.com...
>>> As someone who was until very recently "one of those implementers" I 
>>> completely disagree with you. We had customers who want to process XML 
>>> documents that hundreds of megabytes to gigabytes in size who can't 
>>> afford to materialize even a fraction of these documents in certain 
>>> cases.
>>
>>
>> Dare,
>>
>> what exactly are you disagreeing with ?
>>
>> This discussion is going in zig-zag. Did you read my postings ? Did I 
>> ever tell
>> you that XQuery was the solution for **everything** !? I don't remember 
>> saying that.
>>
>> I was just reading this SAX/streaming/memory consumption discussion, and
>> being a person who actually designed and implemented such a streaming XML
>> query processor, I had a terrible sensation of deja vu. There are solid 
>> solutions
>> in the published and implemented state of the art already.
>>
>> I was just curious to know if there are deep technical issues why people 
>> have to
>> reinvent such techniques. I learned that there are cases where indeed 
>> there is
>> no point in using preexisting XML processors, simply because they don't 
>> apply,
>> and people have to do it by hand.
>>
>> But I also learned that a lot of reinventing the wheel is also for fun. 
>> I'm not gone
>> comment on that. Next time I take a plane I can only cross fingers that 
>> the people who
>> designed the air control traffic system optimized for something different 
>> then their
>> programmers's fun.
>>
>> So I reiterate my point: there are well known techniques to maximize 
>> streaming and
>> minimizing memory  consumption. Many of them are already implemented in 
>> existing
>> systems, and many will show up in the next versions of various industrial 
>> strength
>> products.
>>
>> In a big majority of the cases, people who need to process XML don't need 
>> to understand
>> the gory details of buffer management. And they shouldn't. They should 
>> concentrate only
>> on the logic of their application, and rely on good  XSLT/XQuery 
>> compilers and runtimes
>> to do the right job concerning the implementation strategy.
>>
>> As for the well known techniques for minimizing memory consumption, I am 
>> afraid that
>> I cannot point to any specific technique on this mailing list, for the 
>> following reasons:
>>
>> (a) it's too much literature to be discussed in such a forum
>> (b) a lot of it is folklore
>> (c) a lot of it is simply inherited from streaming and lazy evaluation of 
>> SQL
>> query processors, using the iterator model. (Goetz Graefe can tell you 
>> much more
>> about that then me, and he's closer to you), and you can imagine how much
>> folklore is there too after 30 years
>>
>> The best idea that comes to my mind is to encourage somebody to write a
>> survey of such techniques, that might be helpful.
>>
>> My conclusion: please rely on good compilers, good optimizers and good 
>> runtimes
>> instead of writing XML processors by hand if you don't *really* have to 
>> (and few people
>> really have to). And trust the vendors/open source implementors that they 
>> will produce
>> such good compilers,  optimizers and runtimes when time comes.
>>
>> As far as I am concerned, the horse is dead, I don't have much else to 
>> add.
>>
>> Best regards, have a wonderful holiday season,
>> Dana
>>
>>
>>
>>
>> -----------------------------------------------------------------
>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>> initiative of OASIS <http://www.oasis-open.org>
>>
>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>> To subscribe or unsubscribe from this list use the subscription
>> manager: <http://www.oasis-open.org/mlmanage/index.php>
>>
>>
>
>
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
>
>

References:
- RE: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
  - From: "Dare Obasanjo" <dareo@microsoft.com>
- Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
  - From: Daniela Florescu <dflorescu@mac.com>
- Re: Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
  - From: "Dimitre Novatchev" <dnovatchev@yahoo.com>

Prev by Date: Re: Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
Next by Date: Re: [xml-dev] Converting XML
Previous by thread: Re: Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
Next by thread: Re: [xml-dev] Re: Streaming XML (WAS: More on taming SAX (was Re:[xml-dev] ANN: Amara XML Toolkit 0.9.0))
Index(es):
- Date
- Thread