OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] [Summary] Eager and Just-in-Time loading of XML Schema documents, compiled documents, enhancing performance, streaming

 >But if you're loading the same schema over and over again on each 
validation episode it can be very expensive and have seen many scenarios 
(particularly industry standards) where the set of schema documents are 
several orders of magnitude larger than the typical instance documents 
being validated.

Yes, that is certainly true of FpML to take one example. Most instance 
documents use a tiny subset of the declarations defined in the schema, 
because they cover one kind of financial transaction when the schema 
allows for hundreds of different kinds.

But that's not the only reason. Loading a schema involves a lot more 
than just parsing the source XML documents that define the schema. It's 
necessary to validate that the schema meets all the constraints defined 
in the spec, some of which (like the rules for UPA and for A being a 
valid restriction of B) are highly complex; and it's typically necessary 
to generate and determinize finite state automata for each complex type 
defined in the schema: in the worst case, the memory and processing 
requirements of the textbook algorithms for doing this can be very high.

(Saxon actually creates the FSA for each complex type in the schema 
eagerly, rather than waiting until an instance of that type needs to be 
validated. That's because some errors in the schema, for example UPA 
violation, are detected as a spin-off of the algorithm for FSA 
generation; and I don't like the idea of detecting and reporting schema 
errors during instance validation, especially while validating the 100th 
instance document when 99 others have already been successfully 
validated. This might be a case where a user switch could help: if the 
user is prepared to assert that the schema is already known to be valid, 
Saxon could organize the processing in a way that trades better 
performance for worse error diagnostics.)

Michael Kay

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS