xml-dev - Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xm

Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xm

[ Lists Home | Date Index | Thread Index ]

To: Dare Obasanjo <dareo@microsoft.com>
Subject: Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
From: Daniela Florescu <dflorescu@mac.com>
Date: Tue, 28 Dec 2004 12:22:25 -0800
Cc: XML Developers List <xml-dev@lists.xml.org>
In-reply-to: <830178CE7378FC40BC6F1DDADCFDD1D10276723C@RED-MSG-31.redmond.corp.microsoft.com>
References: <830178CE7378FC40BC6F1DDADCFDD1D10276723C@RED-MSG-31.redmond.corp.microsoft.com>

> As someone who was until very recently "one of those implementers" I 
> completely disagree with you. We had customers who want to process XML 
> documents that hundreds of megabytes to gigabytes in size who can't 
> afford to materialize even a fraction of these documents in certain 
> cases.


Dare,

what exactly are you disagreeing with ?

This discussion is going in zig-zag. Did you read my postings ? Did I 
ever tell
you that XQuery was the solution for **everything** !? I don't remember 
saying that.

I was just reading this SAX/streaming/memory consumption discussion, and
being a person who actually designed and implemented such a streaming 
XML
query processor, I had a terrible sensation of deja vu. There are solid 
solutions
in the published and implemented state of the art already.

I was just curious to know if there are deep technical issues why 
people have to
reinvent such techniques. I learned that there are cases where indeed 
there is
no point in using preexisting XML processors, simply because they don't 
apply,
and people have to do it by hand.

But I also learned that a lot of reinventing the wheel is also for fun. 
I'm not gone
comment on that. Next time I take a plane I can only cross fingers that 
the people who
designed the air control traffic system optimized for something 
different then their
programmers's fun.

So I reiterate my point: there are well known techniques to maximize 
streaming and
minimizing memory  consumption. Many of them are already implemented in 
existing
systems, and many will show up in the next versions of various 
industrial strength
products.

In a big majority of the cases, people who need to process XML don't 
need to understand
the gory details of buffer management. And they shouldn't. They should 
concentrate only
on the logic of their application, and rely on good  XSLT/XQuery 
compilers and runtimes
to do the right job concerning the implementation strategy.

As for the well known techniques for minimizing memory consumption, I 
am afraid that
I cannot point to any specific technique on this mailing list, for the 
following reasons:

(a) it's too much literature to be discussed in such a forum
(b) a lot of it is folklore
(c) a lot of it is simply inherited from streaming and lazy evaluation 
of SQL
query processors, using the iterator model. (Goetz Graefe can tell you 
much more
about that then me, and he's closer to you), and you can imagine how 
much
folklore is there too after 30 years

The best idea that comes to my mind is to encourage somebody to write a
survey of such techniques, that might be helpful.

My conclusion: please rely on good compilers, good optimizers and good 
runtimes
instead of writing XML processors by hand if you don't *really* have to 
(and few people
really have to). And trust the vendors/open source implementors that 
they will produce
such good compilers,  optimizers and runtimes when time comes.

As far as I am concerned, the horse is dead, I don't have much else to 
add.

Best regards, have a wonderful holiday season,
Dana

Follow-Ups:
- Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re:[xml-dev] ANN: Amara XML Toolkit 0.9.0))
  - From: Uche Ogbuji <Uche.Ogbuji@fourthought.com>
- Re: Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
  - From: "Dimitre Novatchev" <dnovatchev@yahoo.com>
- Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
  - From: Michael Champion <michaelc.champion@gmail.com>
- Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
  - From: Alan Gutierrez <alan-xml-dev@engrm.com>

References:
- RE: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
  - From: "Dare Obasanjo" <dareo@microsoft.com>

Prev by Date: Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
Next by Date: Converting XML
Previous by thread: Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
Next by thread: Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
Index(es):
- Date
- Thread