xml-dev - RE: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xm

RE: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xm

[ Lists Home | Date Index | Thread Index ]

To: "Daniela Florescu" <dflorescu@mac.com>,"XML Developers List" <xml-dev@lists.xml.org>
Subject: RE: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
From: "Dare Obasanjo" <dareo@microsoft.com>
Date: Tue, 28 Dec 2004 07:10:55 -0800
Thread-index: AcTsTt1eEeC8KsgaSEm2Yor72/U5RAAn5+IC
Thread-topic: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))

As someone who was until very recently "one of those implementers" I completely disagree with you. We had customers who want to process XML documents that hundreds of megabytes to gigabytes in size who can't afford to materialize even a fraction of these documents in certain cases. Then there were customers who wanted to process thousands of XML documents per minute and couldn't afford to overhead of object creation/memory consumption/GC. Using XQuery or XSLT in such scenarios even with various optimization tricks just wouldn't cut it. 
 
Every paper I've seen on streaming XML assumes some forward only processing OR is just wrong. Instead of telling folks to use Google Scholar or CiteSeer to find relevant works are there any techniques in any papers in particular you want to highlight. 
 
-- 
PITHY WORDS OF WISDOM
Eat right, Exercise, Die anyway.   

________________________________

From: Daniela Florescu [mailto:dflorescu@mac.com]
Sent: Mon 12/27/2004 12:00 PM
To: XML Developers List
Subject: Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))



>     I've thought about using an XPath tracker in error reporting to
>     my library, which would be very simple to add at this point, and
>     it's necessary, I think because the document locator loses
>     meaning when I chain together a bunch of SAX filters.

..........

>
>     In any case, I'm reading through some of the other articles
>     you've been posting. This is a very interesting discussion.

I read with great interest the whole discussion about XML streaming and
SAX,
and I have to admit that I am very confused by it.

Could you guys please try to clarify for me the answer to the following
question: instead
hand coding steaming applications using SAX, couldn't you write some
XQuery code (with external functions probably) to do the same thing ?
Did you try at least ? Did you try and fail ? If yes, why did it fail ?

My hope is that at a certain point people will stop writing low level
code, and they'll
rely on good implementations of XQuery to do the right amount of
streaming, in the
optimal way. That should be vendor's problem, not user's problem.

Other question: why do you people care about "perfect" streaming, i.e.
streaming
with zero memory consumption ? Between perfect streaming and total
materialization
there is a world of possibilities in between, where materialization
happens, but only
restricted to the minimum amount of data required to compute the
answer, and only
for the minimum amount of time necessary to compute the correct answer.

Perfect streaming happens too rarely to be of any interest. What is
interesting is all this
world in between.

Anyway, I believe that people shouldn't try to hand code their
applications using low level
APIs like SAX or STAX, but use a higher level language like XSLT or
XQuery, and trust the
XQuery/XSLT implementors that they'll  do a good job to minimize memory
consumption.
That's *their* job, not  *yours* as users.

But anyway, for those interested in streaming processing XML, the
database
research might come in handy. There have been several studies of the
problem in the
literature. For example you could find some of it at

http://citeseer.ist.psu.edu/

  searching for "streaming XML"; starting from there you might find some
interesting papers.

Best regards, happy holidays,
Dana




-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://www.oasis-open.org/mlmanage/index.php>

Follow-Ups:
- Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
  - From: Daniela Florescu <dflorescu@mac.com>
- Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
  - From: Daniela Florescu <dflorescu@mac.com>

Prev by Date: XML schema user defined simpletype with case insensitiveness
Next by Date: Re: [xml-dev] XML schema user defined simpletype with caseinsensitiveness
Previous by thread: XML schema user defined simpletype with case insensitiveness
Next by thread: Re: [xml-dev] Streaming XML (WAS: More on taming SAX (was Re: [xml-dev] ANN: Amara XML Toolkit 0.9.0))
Index(es):
- Date
- Thread