OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Xerces, schema caching, and namespaces

[ Lists Home | Date Index | Thread Index ]



Daniel McLean wrote:

>The Xerces-C++ parser has the capability of caching grammars for
>subsequent reuse.  Depending on the complexity of the grammar and the
>instance documents, doing can give a significant performance boost.
>However, the way W3C Schema grammars are cached seems a bit strange to me.
>
>All "no-namespace" schemas are considered equivalent: a no-namespace
>schema is stored in the pool of cached grammars using the key "".
>This has ... problematic effects.
>
>  
>
I worked on caching long time back.. IIRC this is the default behavior 
which can be changed,
You can write your own logic to specify the criteria for caching for ex. 
targetNamespace + SchemaLocation etc.
Default behavior of Xerces is to store the grammar using targetNamespace 
as key.
You can also have mutiple Grammar Pools having different type of 
grammars but
then you have to write logic to set appropriate pool per the instance 
document parsed which
may be cumbersome.. You might like to check JAXP 1.3 Schema Validation 
Framework
which looks at the caching behavior in entirely different way.

Neeraj

[1] 
http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/validation/package-summary.html
[2] https://jaxp.dev.java.net

>Rather than invent a new example, I'll pinch one from the Xerces mailing
>list:
>  
>
>>First we parse a document based on schema A with root element A_root.
>>The schema is cached on "". Everything is fine.
>>Then we parse another document based on schema A. The cache finds the
>>schema for "" and validates. Everything is fine.
>>THEN we parse a document based on schema B with root element B_root.
>>The parser looks in the cache, finds the schema for "" (type A) and validates.
>>This of course results in a shitload of errors and a failed parse.
>>    
>>
> [from http://marc.theaimsgroup.com/?l=xerces-c-dev&m=107598912614145&w=2]
>
>To me, this behaviour seems wrong.  However, the Xerces folk think that
>it's right:
>  
>
>>you shouldn't use schema caching if you have different schemas sharing the 
>>same namespace (being this the empty one or not). A namespace URI is a 
>>"domain", is like saying "when I am talking about music a record is 
>>something that has songs in it; when talking about sports a record is the 
>>best performance". You are using two schemas, sharing the same "domain" 
>>label: nothing wrong with that, provided that you don't mix them.
>>    
>>
> [from http://marc.theaimsgroup.com/?l=xerces-c-dev&m=107599141217514&w=2]
>
>What's the right answer?
>
>A additional but related question: is Xerces right to cache W3C Schemas
>that _do_ target namespaces based on the target namespace of the schemas?
>For that to be correct, the target namespace of the schema must be
>considered to play an equivalent role to a DTD's PUBLIC identifier ...
>which doesn't seem unreasonable, but may not be true.
>
>Daniel
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://www.oasis-open.org/mlmanage/index.php>
>
>  
>




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS