Lists Home |
Date Index |
The Xerces-C++ parser has the capability of caching grammars for
subsequent reuse. Depending on the complexity of the grammar and the
instance documents, doing can give a significant performance boost.
However, the way W3C Schema grammars are cached seems a bit strange to me.
All "no-namespace" schemas are considered equivalent: a no-namespace
schema is stored in the pool of cached grammars using the key "".
This has ... problematic effects.
Rather than invent a new example, I'll pinch one from the Xerces mailing
> First we parse a document based on schema A with root element A_root.
> The schema is cached on "". Everything is fine.
> Then we parse another document based on schema A. The cache finds the
> schema for "" and validates. Everything is fine.
> THEN we parse a document based on schema B with root element B_root.
> The parser looks in the cache, finds the schema for "" (type A) and validates.
> This of course results in a shitload of errors and a failed parse.
To me, this behaviour seems wrong. However, the Xerces folk think that
> you shouldn't use schema caching if you have different schemas sharing the
> same namespace (being this the empty one or not). A namespace URI is a
> "domain", is like saying "when I am talking about music a record is
> something that has songs in it; when talking about sports a record is the
> best performance". You are using two schemas, sharing the same "domain"
> label: nothing wrong with that, provided that you don't mix them.
What's the right answer?
A additional but related question: is Xerces right to cache W3C Schemas
that _do_ target namespaces based on the target namespace of the schemas?
For that to be correct, the target namespace of the schema must be
considered to play an equivalent role to a DTD's PUBLIC identifier ...
which doesn't seem unreasonable, but may not be true.