xml-dev - Xerces, schema caching, and namespaces

Xerces, schema caching, and namespaces

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Xerces, schema caching, and namespaces
From: Daniel McLean <daniel@mds.rmit.edu.au>
Date: Wed, 22 Dec 2004 15:42:41 +1100
User-agent: Mutt/1.4i

The Xerces-C++ parser has the capability of caching grammars for
subsequent reuse.  Depending on the complexity of the grammar and the
instance documents, doing can give a significant performance boost.
However, the way W3C Schema grammars are cached seems a bit strange to me.

All "no-namespace" schemas are considered equivalent: a no-namespace
schema is stored in the pool of cached grammars using the key "".
This has ... problematic effects.

Rather than invent a new example, I'll pinch one from the Xerces mailing
list:
> First we parse a document based on schema A with root element A_root.
> The schema is cached on "". Everything is fine.
> Then we parse another document based on schema A. The cache finds the
> schema for "" and validates. Everything is fine.
> THEN we parse a document based on schema B with root element B_root.
> The parser looks in the cache, finds the schema for "" (type A) and validates.
> This of course results in a shitload of errors and a failed parse.
 [from http://marc.theaimsgroup.com/?l=xerces-c-dev&m=107598912614145&w=2]

To me, this behaviour seems wrong.  However, the Xerces folk think that
it's right:
> you shouldn't use schema caching if you have different schemas sharing the 
> same namespace (being this the empty one or not). A namespace URI is a 
> "domain", is like saying "when I am talking about music a record is 
> something that has songs in it; when talking about sports a record is the 
> best performance". You are using two schemas, sharing the same "domain" 
> label: nothing wrong with that, provided that you don't mix them.
 [from http://marc.theaimsgroup.com/?l=xerces-c-dev&m=107599141217514&w=2]

What's the right answer?

A additional but related question: is Xerces right to cache W3C Schemas
that _do_ target namespaces based on the target namespace of the schemas?
For that to be correct, the target namespace of the schema must be
considered to play an equivalent role to a DTD's PUBLIC identifier ...
which doesn't seem unreasonable, but may not be true.

Daniel

Follow-Ups:
- Re: [xml-dev] Xerces, schema caching, and namespaces
  - From: Bob Foster <bob@objfac.com>
- RE: [xml-dev] Xerces, schema caching, and namespaces
  - From: "Michael Kay" <mike@saxonica.com>
- Re: [xml-dev] Xerces, schema caching, and namespaces
  - From: Neeraj Bajaj <Neeraj.Bajaj@Sun.COM>

Prev by Date: Re: [xml-dev] XML and entropy, again
Next by Date: Re: [xml-dev] Xerces, schema caching, and namespaces
Previous by thread: XML and entropy, again
Next by thread: Re: [xml-dev] Xerces, schema caching, and namespaces
Index(es):
- Date
- Thread