[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
[Summary] Eager and Just-in-Time loading of XML Schema documents,compiled documents, enhancing performance, streaming
- From: "Costello, Roger L." <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Sat, 7 Aug 2010 09:46:30 -0400
Hi Folks,
Here is a summary of the recent discussions. Please notify me of any errors. /Roger
--------------------------------------------------------------------------------
The following XML document references two XML Schemas: Library.xsd and Book.xsd
<?xml version="1.0"?>
<Library xmlns="http://www.library.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=
"http://www.library.org
Library.xsd">
<Books>
<Book xmlns=http://www.book.org
xsi:schemaLocation=
"http://www.book.org
Book.xsd">
<Title>My Life and Times</Title>
<Author>Paul McCartney</Author>
<Date>1998</Date>
<ISBN>1-56592-235-2</ISBN>
<Publisher>Macmillan Publishing</Publisher>
</Book>
...
</Books>
</Library>
When does an XML Schema validator load (into memory) the XML Schema documents? When will Library.xsd and Book.xsd be loaded?
Answer: It depends on whether they are coupled or independent.
-------------------------
CASE #1
-------------------------
Suppose Library.xsd and Book.xsd are coupled, i.e., Library.xsd imports Book.xsd.
Here's a snippet of Library.xsd:
<xs:import namespace="http://www.book.org" schemaLocation="Book.xsd"/>
<xs:complexType name="BooksType">
<xs:sequence>
<xs:element xmlns:bk="http://www.book.org" ref="bk:Book"/>
</xs:sequence>
</xs:complexType>
Both schemas will be loaded at the same time--when the validator hits the <Library> element.
This is called eager loading. The validator loads the schemas that schemaLocation references, plus (recursively) all the schemas it imports and includes.
-------------------------
CASE #2
-------------------------
Suppose Library.xsd and Book.xsd are independent.
Here's a snippet of Library.xsd:
<xs:complexType name="BooksType">
<xs:sequence>
<xs:any namespace="http://www.book.org"/>
</xs:sequence>
</xs:complexType>
Library.xsd will be loaded when the validator hits the <Library> element. Book.xsd won't be loaded until the validator hits the <Book> element.
This is called just-in-time loading. The validator loads the schema only when it's needed.
-------------------------
CASE #3
-------------------------
Suppose Library.xsd imports and includes some XML Schemas (but not Book.xsd).
Here's a snippet of Library.xsd:
<xs:import namespace="http://www.example.org" schemaLocation="Example.xsd"/>
<xs:include schemaLocation="Author.xsd"/>
<xs:include schemaLocation="Title.xsd"/>
<xs:include schemaLocation="Date.xsd"/>
When Library.xsd is loaded, the schemas it imports and includes will also be loaded (eager loading). Book.xsd is not loaded until the validator hits the <Book> element (just-in-time loading). Thus, here we see a combination of eager and just-in-time loading.
I have confirmed that the following XML Schema validators have the eager and just-in-time loading behavior described above:
SAXON (Java and .NET) and Xerces-J
I have no information on these validators:
Xerces-C++, Xerces-Perl, Libxml, MSXML, or XSV.
EXPLOITING JUST-IN-TIME LOADING TO ENHANCE PERFORMANCE
Consider this scenario:
1. Your XML document is very large.
2. The XML Schemas that will be used to validate the XML document are independent (or, the XML Schemas can be partitioned into independent sets).
One way to design your XML document is to specify all the XML Schemas upfront:
<?xml version="1.0"?>
<Document xmlns="http://www.library.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=
"http://www.s1.org
S1.xsd
http://www.s2.org
S2.xsd
...
http://www.sn.org
Sn.xsd">
The disadvantage of this approach is that all the schemas will be loaded at once (eager loading). If there are a lot of schemas this could be slow.
A second approach is to specify a schema at the point where it's first needed. This will enable you to exploit the just-in-time loading capability of schema validators. This is illustrated here:
<?xml version="1.0"?>
<Document xmlns="http://www.library.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=
"http://www.s1.org
S1.xsd">
<Element-A>...</Element-A>
<Element-B>...</Element-B>
<Element-C xsi:schemaLocation=
"http://www.s2.org
S2.xsd
<Element-D>...</Element-D>
<Element-E>...</Element-E>
...
</Element-C>
S1.xsd will be loaded when the validator hits the <Document> element. S2.xsd won't be loaded until the validator hits the <Element-C> element. And so forth. This approach exploits just-in-time loading of XML Schema documents.
If the XML document is streamed then this approach may yield significant performance savings.
USING COMPILED XML SCHEMAS TO ENHANCE PERFORMANCE
Another technique that may be used to enhance performance is to compile the XML Schema documents and save the compiled version. Then, when you want to validate the XML document, you use the compiled file (rather than loading the XML Schema documents, compiling them, and then validating).
SAXON supports this ability to compile schemas. Michael Kay writes:
With Saxon, for example, I would advise you to save a
.SCM file representing the compiled schema; reloading the schema from a
.SCM file should be significantly faster than rebuilding it from source
schema documents.
Rich Salz reports that the DataPower products also compile their files first:
The DataPower products work this way. XSLT, XSD, WSDL, XACML, etc., files
are compiled to object code the first time they're used (or you can
pre-load the object cache). Then when actually "used" the object code is
executed directly by the CPU(s).
I do not know if the other schema validators provide the option to compile XML Schemas.
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]