XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] I processed a 3GB XML file ... using XSLT streaming

And because Peter mentioned this, here are two facts one should be aware of:

1. When streaming one doesn't know until the end whether the stream is
"well-formed" or not. If after many hours of processing there is a
wellformedness error, it is likely that some side-effect was already
created (such as sending or posting), that cannot (or is too-late) to
be undone. Are we missing the "(long) transaction" concept here?

2. People usually forget that an XML document is two-dimensional.
There are XML documents that aren't so big in size, that can choke any
streaming processing. One only needs to construct an XML document with
big enough depth. As a streaming processor needs to keep track of all
ancestors of the current node, it will crash at a certain depth.

So, "size is not all" :), or do we need to redefine what "huge" means
wrt an XML document?


Cheers,
Dimitre

On Fri, Sep 13, 2013 at 12:47 PM, Peter Hunsberger
<peter.hunsberger@gmail.com> wrote:
> It might be worth noting that streaming XML has been around for some many
> years now (in spite of the W3C's belief in what makes a well formed
> document).  I think the first custom XML parser I ever wrote was for the
> Sports TIcker data feed probably not long after they first started up 15
> years ago...   The thing that has changed, and I think maybe the point of
> Roger's original post (whether he intended it or not), is the ability to use
> XSLT to handle the streaming data without having to resort to custom
> software.
>
> Peter Hunsberger
>
>
> On Fri, Sep 13, 2013 at 2:28 PM, Dimitre Novatchev <dnovatchev@gmail.com>
> wrote:
>>
>> > The point of streaming is (of course) being able to do such things
>> > without using much memory, even if it's slower. Not everyone has 96G, or
>> > even 16G of memory available... :-)
>>
>>
>> Absolutely true.
>>
>> Also, there could be cases when the XML data is generated in real time
>> continuously and non-stop,  and must be processed again in real time.
>> In such scenarios even terabytes of RAM wouldn't help.
>>
>>
>> Cheers,
>> Dimitre
>>
>> On Fri, Sep 13, 2013 at 12:14 PM, Liam R E Quin <liam@w3.org> wrote:
>> > On Fri, 2013-09-13 at 20:53 +0200, Hermann Stamm-Wilbrandt wrote:
>> >> I did give your non-streaming stylesheet (B) a try (A).
>> >> Slight modifications were necessary to get back to XSLT 1.0
>> >> (eq -> = , doc -> document).
>> >> 16GB of memory were used (17851470K-1067694K), (D).
>> >
>> > The point of streaming is (of course) being able to do such things
>> > without using much memory, even if it's slower. Not everyone has 96G, or
>> > even 16G of memory available... :-)
>> >
>> > Liam
>> >
>> > --
>> > Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
>> > Pictures from old books: http://fromoldbooks.org/
>> > Ankh: irc.sorcery.net irc.gnome.org freenode/#xml
>> >
>> >
>> > _______________________________________________________________________
>> >
>> > XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> > to support XML implementation and development. To minimize
>> > spam in the archives, you must subscribe before posting.
>> >
>> > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> > Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> > subscribe: xml-dev-subscribe@lists.xml.org
>> > List archive: http://lists.xml.org/archives/xml-dev/
>> > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>> >
>>
>>
>>
>> --
>> Cheers,
>> Dimitre Novatchev
>> ---------------------------------------
>> Truly great madness cannot be achieved without significant intelligence.
>> ---------------------------------------
>> To invent, you need a good imagination and a pile of junk
>> -------------------------------------
>> Never fight an inanimate object
>> -------------------------------------
>> To avoid situations in which you might make mistakes may be the
>> biggest mistake of all
>> ------------------------------------
>> Quality means doing it right when no one is looking.
>> -------------------------------------
>> You've achieved success in your field when you don't know whether what
>> you're doing is work or play
>> -------------------------------------
>> Facts do not cease to exist because they are ignored.
>> -------------------------------------
>> Typing monkeys will write all Shakespeare's works in 200yrs.Will they
>> write all patents, too? :)
>> -------------------------------------
>> I finally figured out the only reason to be alive is to enjoy it.
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>



-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS