xml-dev - Re: [xml-dev] performance comparison

Re: [xml-dev] performance comparison

[ Lists Home | Date Index | Thread Index ]

To: Xml-dev@lists.xml.org
Subject: Re: [xml-dev] performance comparison
From: Mike Champion <mc@xegesis.org>
Date: Thu, 11 Jul 2002 21:53:52 -0400
In-reply-to: <20020711233219.XVEE24728.rwcrmhc51.attbi.com@rwcrwbc70>

7/11/2002 7:32:19 PM, zhengyu@attbi.com wrote:

>  2.  Most people mention SAX can handle files larger
>than memory, but I am thinking, is this really the case,
>because files are read into the kernel buffer, so large
>files still have to be read into the memory, just not in
>user space. Am I right? 

DOM builders generally load the entire document into a tree structure.
SAX operates at parse time; it can call a user-defined function 
for each element, attribute list, entity reference, etc.  The application
can choose to either process the XML data and throw it away (meaning
that the total size of the document is independent of the memory
usage) or build another data structure, store the data to DBMS,
or whatever. This provides the usual tradeoff -- more work for the 
application programmer but more control over resource usage.

>  3. DOM is memory-thirsty, according to most articles I
>read. So DOM's performance lags, does anyone run any type
>of profiling, and I am interested in why it is memory
>hungry, and poor in terms of performance.

It is quite true that if one simply defines classes that
directly implement all the DOM interfaces, each Node will be
fairly large because of all the properties and methods defined 
on the basic Node interface. The DOM exposes several 
different models of an XML document -- a tree with parents, 
children, and siblings; lists of nodes containing lists of
nodes, a more OO conception of Document, Element, Attribute, etc.
objects, and a more abstract model where the document is 
traversed via iterators.  

Still, this is an implementation issue, not intrinsic to the DOM
API.  There are some DOM implementations that are "lazy", i.e.,
only build actual objects implementing the DOM interface when
a specific part of the document is accessed.  There are also
persistent DOMs, where the parser essentially loads a database
that is then navigated and queried on demand.  Both these 
techniques would be less memory hungry than a straightforward
implementation of the spec.

>
>  4. What do people think of pull type parsers and DOM
>SAX hybrids? Are these popular and stable?

There's been a lot written on this, but you'll probably have
to sort it out for yourself.  A simple Googling for
"xml pull parser performance" yields quite a number of
articles. It's probably something to consider if you have
lots of data and relatively constrained processors, but 
a well-defined application.  I'd say in general that 
the more flexibility you need, the more you need a DOM-like
API; the more you can constrain EXACTLY what the application will
do with each bit of markup, the more you can exploit a streaming
approach.

>
>  5. Is it possible for SAX to support XSLT?

Well, several (most?) XSLT implementations support SAX parsers
to build the tree for transformation.  Strictly speaking, however,
you're not getting some of the infinite document size / 
efficiency advantages of SAX because a conformant XSLT implementation
must keep the entire document around because the stylesheet can refer
to arbitrary pieces.

There are extensions to SAXON, I believe, to support a more 
efficient use of memory by having the user tell the XSLT 
engine what sections of the document to look at ... see the 
<saxon:preview> extension element?   There are also occasional
discussions of "streaming XSLT" processors (I don't know if 
any actually exist in a stable, available form) but they would
have to operate on a subset of XSLT.  I should probably shut up
and let someone who knows what they're talking about explain the
situation :~)

Follow-Ups:
- Re: [xml-dev] performance comparison
  - From: "Thomas B. Passin" <tpassin@comcast.net>

References:
- performance comparison
  - From: zhengyu@attbi.com

Prev by Date: RE: [xml-dev] performance comparison
Next by Date: Re: [xml-dev] Validating an XML document having no DTD/XSD (usingXerces)
Previous by thread: performance comparison
Next by thread: Re: [xml-dev] performance comparison
Index(es):
- Date
- Thread