xml-dev - Pipelines : inside or outside the parser ? (was RE: [xml-dev] Java Techn

Pipelines : inside or outside the parser ? (was RE: [xml-dev] Java Techn

[ Lists Home | Date Index | Thread Index ]

To: 'Elena Litani' <elitani@ca.ibm.com>
Subject: Pipelines : inside or outside the parser ? (was RE: [xml-dev] Java Technology and XML : API benchmark)
From: Nicolas LEHUEN <nicolas.lehuen@ubicco.com>
Date: Wed, 13 Mar 2002 22:10:46 +0100
Cc: "'xml-dev@lists.xml.org'" <xml-dev@lists.xml.org>

Hi Elena,

It's under "Comparing DTD and XML Schema Validation Performance" :

8<----8<----8<----8<----8<----8<----8<----8<----8<----
With Xerces, validating against an XML Schema is more expensive than
validating against an equivalent DTD. When no validation is performed, using
XML Schema appears to be more performant than using DTD. This is due to the
fact that in that particular implementation of Xerces (1.4.4), the XML
Schema referenced in the input document is not even read (as revealed by
monitoring the calls to the EntityResolver). The XML Schema instance
document is therefore treated as a DTD-less document (since there is not
even a Document Type Declaration - DOCTYPE). 

When running the benchmark with a more recent version of Xerces (2.0b4),
when no validation is performed, using XML Schema is less performant than
using DTD, quasi-mirroring the results obtained when validating. In that
particular instance, monitoring the calls to the EntityResolver showed that
the XML Schema was indeed loaded and parsed. Incidentally, we can also
notice a sensible improvement of performance with this newer version of
Xerces. 
8<----8<----8<----8<----8<----8<----8<----8<----8<----

The "bug", or at least unwanted behaviour, is that even if no validation is
performed, the XML Schema is loaded and parsed. I don't know if this has a
big impact on performance, yet I think the problem would have been easily
dismissed if the parsing and validating layers where clearly separated.

Anyway, from what you say, they are separated (though I don't understand why
such a bug would exist, then). I read a part of the documentation you
pointed us to and I like the clean component system.

What I don't like, though, is that all this composition system takes place
"under the hood", with a custom interface (XNI), the whole system being
hidden under the main parsing API (JAXP). I think this system should be
fully outside of the parser, that is to say I think the parser should not be
the main API, but a part of a bigger system.

Think it as a kind of opposition between XNI under the hood and JAXP as a
public interface vs. JAXP for parsing and something like SAX filters or
XPipe to build processing pipelines. Maybe there are reasons that makes the
XNI approach more interesting, I have to do further reading on the XNI doc,
but intuitively I prefer an approach where processing takes place outside of
the parser rather than inside.

Regards,
Nicolas

>-----Message d'origine-----
>De : Elena Litani [mailto:elitani@ca.ibm.com]
>Envoye : mercredi 13 mars 2002 20:09
>A : Nicolas LEHUEN
>Objet : Re: [xml-dev] Java Technology and XML : API benchmark
>
>
>Nicolas LEHUEN wrote:
>> 
>> So the bug reported by the article from Sun has been fixed since ?
>
>To what bug you are referring..? I did not think I saw any..
>
>-- 
>Elena Litani / IBM Toronto
>

Follow-Ups:
- Re: Pipelines : inside or outside the parser ? (was RE: [xml-dev] Java Technology and XML : API benchmark)
  - From: Elena Litani <elitani@ca.ibm.com>

Prev by Date: RE: [xml-dev] Java Technology and XML : API benchmark
Next by Date: Re: [xml-dev] problem with noNamespaceSchemaLocation.....urgent !!!
Previous by thread: RE: [xml-dev] problem with noNamespaceSchemaLocation.....urgent !!!
Next by thread: Re: Pipelines : inside or outside the parser ? (was RE: [xml-dev] Java Technology and XML : API benchmark)
Index(es):
- Date
- Thread