OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Processing huge XML files

[ Lists Home | Date Index | Thread Index ]

Thomas Lee <ytlee@cecid.hku.hk> writes:

> We need to process very large XML files (up to a few ten MB). So we
> can't process the whole file with DOM in memory. We are prepared to
> use SAX to parse a large XML file and store into our data structure in
> the disk (propietary data structure or RDBMS). We'll also provide a
> set of APIs for access the contents in the XML file. Now the XML
> content change is not required.

In the computational linguistics domain, we regularly work with XML
documents in the 100MB .. GB range.  For many of our applications,
streaming processing using a hybrid pull/tree-fragment API allows very
efficient processing -- see our LT XML toolkit [1] and a paper about
its use [2].


[1] http://www.ltg.ed.ac.uk/software/xml/
[2] http://www.ltg.ed.ac.uk/~dmck/Papers/chum.ps
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2002, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/
 [mail really from me _always_ has this .sig -- mail without it is forged spam]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS