XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] I processed a 3GB XML file ... using XSLT streaming

Costello, Roger L. wrote:

> I processed this 3GB Open Street Map (OSM) XML file for the state of Massachusetts:
>
> http://downloads.cloudmade.com/americas/northern_america/united_states/massachusetts/massachusetts.osm.bz2
>
> An OSM file consists of <node> elements (a node represents a point), <way> elements (a way represents a line or area), and <relation> elements (a relationship represents a relationship between other elements). This page describes the elements:
>
> http://wiki.openstreetmap.org/wiki/Tags
>
> Here is a snippet of the XML file; it shows the data for one of the schools in Massachusetts (a school is identified by @k="amenity" @v="school"):



> Problem: What are all the schools in the state of Massachusetts?
>
> I was able to answer that problem using the new streaming capability in XSLT 3.0.
>
> Below are two versions of an XSLT program. The first version (non-streaming version) is how one might try to solve the problem prior to XSLT streaming. I ran it on the XML file and it quickly halted with an "Out of memory" error message. The second version uses XSLT streaming and it solved the problem. There are over 6000 schools in Massachusetts! At the bottom of this message I show some of the schools.

> --------------------------------------------------
>      Streaming approach
> --------------------------------------------------
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>                           xmlns:xs="http://www.w3.org/2001/XMLSchema";
>                           exclude-result-prefixes="#all"
>                           version="3.0">
>
>      <xsl:output method="xml" />
>
>      <xsl:template match="/">
>          <xsl:stream href="../huge-file/massachusetts.xml">
>              <count>
>                  <xsl:for-each select="osm">
>                      <xsl:iterate select="node">

If 'a school is identified by @k="amenity" @v="school"', I wonder why 
that condition does not need to appear in your code.

>                          <xsl:param name="count" select="0" as="xs:decimal"/>
>                          <xsl:next-iteration>
>                              <xsl:with-param name="count" select="$count+1"/>
>                          </xsl:next-iteration>
>                          <xsl:on-completion>
>                              <xsl:value-of select="$count"/>
>                          </xsl:on-completion>
>                      </xsl:iterate>
>                  </xsl:for-each>
>              </count>
>          </xsl:stream>
>      </xsl:template>
>
> </xsl:stylesheet>
>
> --------------------------------------------------
>      Schools in Massachusetts
> --------------------------------------------------
> <Schools>
>      <school>1. Wayland Middle School</school>
>      <school>2. Walnut Hill School for the Arts</school>

Where do you generate that list? The streaming version of your program 
seems to just output a 'count' element but not 'Schools' nor 'school' 
elements.




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS