[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [Summary] Eager and Just-in-Time loading of XML Schema documents, compiled documents, enhancing performance, streaming
- From: Michael Kay <mike@saxonica.com>
- To: xml-dev@lists.xml.org
- Date: Sat, 07 Aug 2010 18:41:07 +0100
>But if you're loading the same schema over and over again on each
validation episode it can be very expensive and have seen many scenarios
(particularly industry standards) where the set of schema documents are
several orders of magnitude larger than the typical instance documents
being validated.
Yes, that is certainly true of FpML to take one example. Most instance
documents use a tiny subset of the declarations defined in the schema,
because they cover one kind of financial transaction when the schema
allows for hundreds of different kinds.
But that's not the only reason. Loading a schema involves a lot more
than just parsing the source XML documents that define the schema. It's
necessary to validate that the schema meets all the constraints defined
in the spec, some of which (like the rules for UPA and for A being a
valid restriction of B) are highly complex; and it's typically necessary
to generate and determinize finite state automata for each complex type
defined in the schema: in the worst case, the memory and processing
requirements of the textbook algorithms for doing this can be very high.
(Saxon actually creates the FSA for each complex type in the schema
eagerly, rather than waiting until an instance of that type needs to be
validated. That's because some errors in the schema, for example UPA
violation, are detected as a spin-off of the algorithm for FSA
generation; and I don't like the idea of detecting and reporting schema
errors during instance validation, especially while validating the 100th
instance document when 99 others have already been successfully
validated. This might be a case where a user switch could help: if the
user is prepared to assert that the schema is already known to be valid,
Saxon could organize the processing in a way that trades better
performance for worse error diagnostics.)
Michael Kay
Saxonica
- References:
- [Summary] Eager and Just-in-Time loading of XML Schema documents,compiled documents, enhancing performance, streaming
- From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] [Summary] Eager and Just-in-Time loading of XML Schema documents, compiled documents, enhancing performance, streaming
- From: Mukul Gandhi <gandhi.mukul@gmail.com>
- Re: [xml-dev] [Summary] Eager and Just-in-Time loading of XML Schema documents, compiled documents, enhancing performance, streaming
- From: Michael Glavassevich <mrglavas@ca.ibm.com>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]