XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] combining XMLEvent lists

  My guess would be "XMLEvent" is refering to StAX Events.

http://woodstox.codehaus.org/javadoc/stax-api/1.0/javax/xml/stream/events/XMLEvent.html

which is a parsed XML event (startDocument, startElement  , characters ... )


David A. Lee
dlee@calldei.com
http://www.xmlsh.org


On 9/28/2010 1:17 PM, Michael Kay wrote:
>
>  On 28/09/2010 4:13 PM, Johannes.Lichtenberger wrote:
>> On 09/28/2010 04:33 PM, Michael Kay wrote:
>>> Sounds fascinating, and I wish I had time to get involved. It would
>>> certainly be elegant if you could have both the productivity of writing
>>> this declaratively in XSLT and the performance of running it on Hadoop
>>> MapReduce. Intrinsically, the two seem to fit together hand in glove,
>>> but I suspect some engineering effort is needed to make it work.
>> Hello Michael,
>>
>> I think it would be too complicated to achieve the desired grouping with
>> Java. Do you think it makes sense to first serialize the results and
>> then use Saxon's XSLT 2.0 processor to achieve the results? Or do you
>> have any direct input from a List of XMLEvents to Saxon's XSLT
>> processor? I assume it reads XML-data from an InputSource or some kind
>> of a stream.
>
> I'm not sure whether "XMLEvent" is something I'm expected to know 
> about: you said earlier "
>
> I've got an Iterator with Lists (Java) out of XMLEvents, which are
> serialized fragments
>
> so I assume they are just strings containing unparsed XML. That's not 
> going to be a particularly efficient representation for processing, so 
> the first step will be to parse each one to a tree (for example, a 
> Saxon TinyTree).
>
> You then said,
>
> I want to find combine Lists which have the same page id and the same
> revision timestamp
>
> but you left out the critical information as to whether this would 
> always combine elements
> that were adjacent in the list. If the groups are adjacent then you 
> could potentially devise
> a strategy that avoid holding all the trees in memory at the same time.
>
> Supplying a sequence of trees as input to Saxon grouping is not a 
> problem. Using the s9api interface,
> you can use a DocumentBuilder to build each tree as an XdmNode, then a 
> sequence can be constructed using
> the constructor public XdmValue(Iterable<XdmItem>  items), and then 
> this XdmValue can be passed as a parameter
> to an XsltTransformer, and a reference to the parameter can be used 
> in<xsl:for-each-group select="$param">.
> Using this approach the whole structure will be held in memory, but 
> there are ways of avoiding that by going
> to lower-level interfaces.
>
> Michael Kay
> Saxonica
>
>
>> It's a special case, where two or more revisions of one article are made
>> at the same time (in the same second). I would have to look through the
>> XML file with BaseX or Saxon, but I'm pretty sure such cases exist
>> somewhere in the hugh file (as of now I've only extracted a small subset
>> of articles and replaced WikiText inside text-elements with XML).
>>
>> The whole task is to sort the revisions to shredder it into our XML
>> datastorage system (the deltas of the revisions), which has the
>> capability to store and retrieve revisions compactly and efficiently. In
>> parallel I'm currently writing the import of a sorted XML file.
>>
>> My main task (master project and thesis) is or will be the visualization
>> of temporal tree structured data to gain further insights into the
>> evolution of the data, which are otherwise very difficult to realize.
>>
>> regards,
>> Johannes
>>
>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS