xml-dev - Re: [xml-dev] Handling very large instance docs

Re: [xml-dev] Handling very large instance docs

[ Lists Home | Date Index | Thread Index ]

To: <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] Handling very large instance docs
From: "Karl Waclawek" <karl@waclawek.net>
Date: Thu, 29 Apr 2004 10:28:30 -0400
References: <p06020415bcb6ad5e7762@[89.10.0.19]> <p05200000bcb6b91b64d3@[128.253.109.55]>



> >At the very least I need to be able to sequentially process a large
> >document and extract an identified sub-tree (ideally denoted by an
> >XPath expression) for run-of-the-mill tools to manipulate. I assume
> >such a beast would need to be based on a SAX parser.
> 
> I did exactly that in Python.  I considered building an engine that 
> could filter SAX events to those that match a limited version of 
> XPath, but ran out of gas.  I ended up with a just regular SAX 
> application.

Interesting - I always thought such a thing is useful, but haven't
come across implementation.

I built something like that in Delphi (I call it SAXPath)
on top of SAX. First you define an array of records (structs)
each with a name (or wildcard) - like XPath - and a call-back
interface pointer (used for filtering/predicates or processing).
I call the array elements "path nodes", and the array "path handler".
Then you register such an array with a "handler manager" for processing.
Only relative paths are currently supported.

Call-backs are done on every node of such a "path handler"
as long as it matches and as long as filter-call-backs 
further up haven't de-activated the "path handler".

For the projects I am involved in this has proven very practical.

Karl

Follow-Ups:
- Re: [xml-dev] Handling very large instance docs
  - From: Kevin Jones <kjouk@yahoo.co.uk>

References:
- Handling very large instance docs
  - From: Andy Greener <andy@gid.co.uk>
- Re: [xml-dev] Handling very large instance docs
  - From: Joel Bender <jjb5@cornell.edu>

Prev by Date: Re: [xml-dev] Handling very large instance docs
Next by Date: Namespace arguments, namespace routing language
Previous by thread: Re: [xml-dev] Handling very large instance docs
Next by thread: Re: [xml-dev] Handling very large instance docs
Index(es):
- Date
- Thread