[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Disk-based XPath Processing
- From: Tatu Saloranta <cowtowncoder@yahoo.com>
- To: Philippe Poulard <Philippe.Poulard@sophia.inria.fr>, Uche Ogbuji <uche@ogbuji.net>
- Date: Mon, 2 Oct 2006 21:26:09 -0700 (PDT)
Thanks Philippe, Michael and Uche -- it is great to
learn that there are indeed efforts to allow xpath
access in streaming manner. It has definitely been
talked about a lot (as in "wouldn't it be nice to be
able to..."), so it's encouraging there is progress
too.
On a related note: I noticed a reference (from
Saxon-SA page, I think) that implied that w3c schema
specifies a n xpath subset (perhaps for defining how
key constraints can be defined using subset of
xpath?), like one that would be useful for streaming
access.
Is this true?
Or are there other formal subset specifications, to
make it easier to know exactly what subset is
(guaranteed to be) supported by an implementation?
-+ Tatu +-
--- Philippe Poulard
<Philippe.Poulard@sophia.inria.fr> wrote:
> Uche Ogbuji wrote:
> > Tatu Saloranta wrote:
> >
> >>Alas, although there is quite a bit of interest, I
> >>haven't seen solutions where streaming parsers
> could
> >>use some suitable subset of XPath to match
> sub-trees
> >>(suitable meaning that only some axes were
> supported,
> >>parent/grandparent, attribute, children, but not
> >>sibling). I have been hoping to investigate doing
> this
> >>myself in near future, since it would seem to
> simplify
> >>some streaming-oriented tasks (like only building
> >>small sub-trees, or one sub-tree at a time from a
> >>bigger document).
> >
> >
> > What you describe in the above para is pretty much
> exactly what Amara's
> > pushbind and pushdom allow, and the trimxml tool
> that John L. Clark
> > mentions, exposes this approach on the command
> line. They use a subset
> > of XSLT patterns (which are themselves a subset
> of XPath, as defined
> > int he XSLT 1.0 spec) to drive a streamable
> operation that only loads
> > into memory one subtree at a time from a larget
> document. I think it
> > does still need a little baking, but I've been
> successful using it for
> > some pretty heavy-duty work.
> >
>
> hi,
>
> I have been working few months ago on XPath
> filtering on SAX streams ;
> it support XPath patterns with predicates and
> forward axes, etc, like this :
>
> a[@b]
> a[not(@b)]
> a[@b='c']
> a[@b='c']/d[@e]
> /a/b/c[1]
> a/*[2]
> a/comment()[3]
> a/node()[position() < 4]
> /a/b/c[last()]
> a/*[count() > 3]
> a/node()[last()]
> a[following-sibling::b]
> a[b]
> a[*[not(self::b)]]
> id("foo")
> id("foo")/child::para[position()=5]/a/b/c[last()]
>
> but you should be aware that :
> -when parsing, if you use an expression that consist
> on reading the
> whole tree, the whole tree will be cached, and you
> should use DOM
> instead ; that is to say if you do silly things,
> you'll get them ; if
> you have a really huge XML file, don't do such
> things otherwise you'll
> get an OutOfMemory error
> -when a node has been discarded, you can't reach it
> again : revert axes
> (except ancestor axes) are not available ; the sole
> thing you can do is
> to anticipate by storing a part of the tree in a DOM
> fragment and work
> with it, then discard it
>
> the technique used is described on
> http://reflex.gforge.inria.fr/saxPatterns.html
> (this is a preview)
>
> the implementation is in Java and is part of the
> RefleX engine
> (http://reflex.gforge.inria.fr/) ; unfortunately, I
> didn't have yet
> published the last release with all that stuff ;
> however, you can browse
> the SVN repository if you are (very very very)
> curious :
>
https://gforge.inria.fr/plugins/scmsvn/viewcvs.php/root/src/java/org/inria/reflex/xml/filter/?rev=104&root=reflex
>
https://gforge.inria.fr/plugins/scmsvn/viewcvs.php/root/src/java/org/inria/reflex/xml/sax/?root=reflex
>
> the new version of RefleX to come will supply a set
> of tags that allow
> to filter SAX streams with XPath patterns ; here are
> common use cases
> that XSLT users should find easy to understand :
>
> <xcl:filter
> xmlns:xcl="http://www.inria.fr/xml/active-tags/xcl">
>
> <!-- copy -->
> <xcl:rule pattern="copy">
> <xcl:forward>
> <xcl:apply-rules/>
> </xcl:forward>
> </xcl:rule>
>
> <!--delete the element and its content-->
> <xcl:rule pattern="deleteElem"/>
>
> <!-- ignore an element, but apply rules on its
> content -->
> <xcl:rule pattern="ignoreElem">
> <xcl:forward>
> <insertedBefore/>
> </xcl:forward>
> <xcl:apply-rules/>
> <xcl:forward>
> <insertedAfter/>
> </xcl:forward>
> </xcl:rule>
>
> <!--insert a container-->
> <xcl:rule pattern="content">
> <xcl:forward>
> <insertedContainer>
> <xcl:apply-rules/>
> </insertedContainer>
> </xcl:forward>
> </xcl:rule>
>
> <!--remove an attribute-->
> <xcl:rule pattern="removeAttr">
> <xcl:remove parent="{ . }" referent="{ @bar
> }"/>
> <xcl:forward>
> <xcl:apply-rules/>
> </xcl:forward>
> </xcl:rule>
>
> <!--remove all attributes-->
> <xcl:rule pattern="removeAllAttr">
> <xcl:remove parent="{ . }" referent="{ @*
> }"/>
> <xcl:forward>
> <xcl:apply-rules/>
> </xcl:forward>
> </xcl:rule>
>
> <!--change the value of an attribute-->
> <xcl:rule pattern="changeAttr">
> <xcl:attribute referent="{ . }" name="foo"
> value="foo"/>
> <xcl:forward>
> <xcl:apply-rules/>
> </xcl:forward>
> </xcl:rule>
>
> </xcl:filter>
>
> A filter reads entirely one or several inputs, and
> can produce several
> outputs. Unlike XSLT, an XCL filter traverses each
> input tree in its
> natural order only. More complex processes that
> require deep structure
> transformations should be considered with XSLT. XCL
> filters are suitable
> when processes are localized on independant chunks
> of datas, which is
> advantageous for stream-processing of large inputs,
> although XCL filters
> can be also convenient for traversing automatically
> a DOM tree. By
> combining other active tags with the small set
> defined here, it is yet
> possible to achieve efficient pipeline processes.
>
> of course, you can combine these basic structures at
> will, as long as
> you use a single <xcl:apply-rules/> element
>
> of course, several filters can be connected to a
> pipeline, including
> steps that are involving XSLT filtering and XInclude
> processing
>
> of course, this kind of filter will be appliable on
> SAX streams and DOM
> trees, at user option
>
> --
> Cordialement,
>
> ///
> (. .)
> --------ooO--(_)--Ooo--------
> | Philippe Poulard |
>
=== message truncated ===
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]