[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
I processed a 3GB XML file ... using XSLT streaming
- From: "Costello, Roger L." <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Fri, 13 Sep 2013 15:38:04 +0000
Hi Folks,
I processed this 3GB Open Street Map (OSM) XML file for the state of Massachusetts:
http://downloads.cloudmade.com/americas/northern_america/united_states/massachusetts/massachusetts.osm.bz2
An OSM file consists of <node> elements (a node represents a point), <way> elements (a way represents a line or area), and <relation> elements (a relationship represents a relationship between other elements). This page describes the elements:
http://wiki.openstreetmap.org/wiki/Tags
Here is a snippet of the XML file; it shows the data for one of the schools in Massachusetts (a school is identified by @k="amenity" @v="school"):
<osm version="0.6" generator="osm-extract.pl">
...
<node id="358264143" version="1" timestamp="2009-03-10T04:54:34Z"
uid="4732" user="iandees" changeset="774950" lat="42.2017681"
lon="-70.7561527">
<tag k="gnis:created" v="08/27/2002"/>
<tag k="gnis:county_id" v="023"/>
<tag k="name" v="Scituate Center Central School"/>
<tag k="amenity" v="school"/>
<tag k="gnis:feature_id" v="602607"/>
<tag k="gnis:state_id" v="25"/>
<tag k="ele" v="34"/>
</node>
...
</osm>
Problem: What are all the schools in the state of Massachusetts?
I was able to answer that problem using the new streaming capability in XSLT 3.0.
Below are two versions of an XSLT program. The first version (non-streaming version) is how one might try to solve the problem prior to XSLT streaming. I ran it on the XML file and it quickly halted with an "Out of memory" error message. The second version uses XSLT streaming and it solved the problem. There are over 6000 schools in Massachusetts! At the bottom of this message I show some of the schools.
How long did it take for the streaming XSLT program to process the 3GB XML file? Answer: 134 seconds.
I ran my streaming XSLT program using oXygen XML, which invoked SAXON. I ran it on my laptop running Windows 7.
--------------------------------------------------
Non-streaming approach
--------------------------------------------------
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="xml" />
<xsl:variable name="MA" select="doc('../huge-file/massachusetts.xml')" />
<xsl:template match="/">
<xsl:apply-templates select="$MA/osm" />
</xsl:template>
<xsl:template match="osm">
<Schools>
<xsl:for-each select="node">
<xsl:if test="tag[(@k eq 'amenity') and (@v eq 'school')]">
<school>
<xsl:value-of select="tag[@k eq 'name']/@v" />
</school>
</xsl:if>
</xsl:for-each>
</Schools>
</xsl:template>
</xsl:stylesheet>
--------------------------------------------------
Streaming approach
--------------------------------------------------
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0">
<xsl:output method="xml" />
<xsl:template match="/">
<xsl:stream href="../huge-file/massachusetts.xml">
<count>
<xsl:for-each select="osm">
<xsl:iterate select="node">
<xsl:param name="count" select="0" as="xs:decimal"/>
<xsl:next-iteration>
<xsl:with-param name="count" select="$count+1"/>
</xsl:next-iteration>
<xsl:on-completion>
<xsl:value-of select="$count"/>
</xsl:on-completion>
</xsl:iterate>
</xsl:for-each>
</count>
</xsl:stream>
</xsl:template>
</xsl:stylesheet>
--------------------------------------------------
Schools in Massachusetts
--------------------------------------------------
<Schools>
<school>1. Wayland Middle School</school>
<school>2. Walnut Hill School for the Arts</school>
<school>3. Library</school>
<school>4. Resource Center</school>
<school>5. Deerfield Academy</school>
<school>6. Arthur W Coolidge Middle</school>
<school>7. Lilliput School</school>
<school>8. L H Coffin</school>
<school>9. St Joseph Central High</school>
<school>10. Stoneham High School</school>
<school>11. St Louis</school>
<school>12. Franklin</school>
<school>13. East Falmouth Elem</school>
<school>14. May Institute (Woburn)</school>
<school>15. Frolio Jr Hs</school>
<school>16. Barbieri Elem</school>
<school>17. Bishop Stang High</school>
<school>18. Jackson Mann</school>
<school>19. Douglas</school>
<school>20. Brookfield Elementary</school>
...
<school>5996. Hawley Grammar School (was here)</school>
<school>5997. Shady Hill School</school>
<school>5998. Cotting School</school>
<school>5999. Morril 2</school>
<school>6000. Morril 4 North</school>
<school>6001. Morril 3</school>
<school>6002. John W Wynn Middle School</school>
<school>6003. John W. Wynn Middle</school>
<school>6004. Boston Architectural College</school>
<school>6005. High School</school>
<school>6006. Sargent College</school>
<school>6007. Stop -n- Go Driving Academy</school>
<school>6008. Ashland High School</school>
<school>6009. Center Stage Dance Academy</school>
</Schools>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]