XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
[Summary] Eager and Just-in-Time loading of XML Schema documents,compiled documents, enhancing performance, streaming

Hi Folks,

Here is a summary of the recent discussions. Please notify me of any errors.  /Roger

--------------------------------------------------------------------------------

The following XML document references two XML Schemas: Library.xsd and Book.xsd

<?xml version="1.0"?>
<Library xmlns="http://www.library.org";
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
         xsi:schemaLocation=
                    "http://www.library.org
                     Library.xsd">
    <Books>
        <Book xmlns=http://www.book.org
              xsi:schemaLocation=
                           "http://www.book.org
                            Book.xsd">
                <Title>My Life and Times</Title>
                <Author>Paul McCartney</Author>
                <Date>1998</Date>
                <ISBN>1-56592-235-2</ISBN>
                <Publisher>Macmillan Publishing</Publisher>
        </Book>
        ... 
    </Books>
</Library>

When does an XML Schema validator load (into memory) the XML Schema documents? When will Library.xsd and Book.xsd be loaded?

Answer: It depends on whether they are coupled or independent. 

-------------------------
        CASE #1
-------------------------
Suppose Library.xsd and Book.xsd are coupled, i.e., Library.xsd imports Book.xsd.

Here's a snippet of Library.xsd:

<xs:import namespace="http://www.book.org"; schemaLocation="Book.xsd"/>

<xs:complexType name="BooksType">
   <xs:sequence>
       <xs:element xmlns:bk="http://www.book.org"; ref="bk:Book"/>
   </xs:sequence>
</xs:complexType>


Both schemas will be loaded at the same time--when the validator hits the <Library> element.

This is called eager loading. The validator loads the schemas that schemaLocation references, plus (recursively) all the schemas it imports and includes.


-------------------------
        CASE #2
-------------------------
Suppose Library.xsd and Book.xsd are independent. 

Here's a snippet of Library.xsd:

<xs:complexType name="BooksType">
   <xs:sequence>
       <xs:any namespace="http://www.book.org"/>
   </xs:sequence>
</xs:complexType>


Library.xsd will be loaded when the validator hits the <Library> element. Book.xsd won't be loaded until the validator hits the <Book> element.

This is called just-in-time loading. The validator loads the schema only when it's needed. 


-------------------------
        CASE #3
-------------------------
Suppose Library.xsd imports and includes some XML Schemas (but not Book.xsd). 

Here's a snippet of Library.xsd:

<xs:import namespace="http://www.example.org"; schemaLocation="Example.xsd"/>

<xs:include schemaLocation="Author.xsd"/>

<xs:include schemaLocation="Title.xsd"/>

<xs:include schemaLocation="Date.xsd"/>


When Library.xsd is loaded, the schemas it imports and includes will also be loaded (eager loading). Book.xsd is not loaded until the validator hits the <Book> element (just-in-time loading). Thus, here we see a combination of eager and just-in-time loading.



I have confirmed that the following XML Schema validators have the eager and just-in-time loading behavior described above: 

    SAXON (Java and .NET) and Xerces-J

I have no information on these validators: 

    Xerces-C++, Xerces-Perl, Libxml, MSXML, or XSV.


EXPLOITING JUST-IN-TIME LOADING TO ENHANCE PERFORMANCE

Consider this scenario: 

1. Your XML document is very large.

2. The XML Schemas that will be used to validate the XML document are independent (or, the XML Schemas can be partitioned into independent sets).

One way to design your XML document is to specify all the XML Schemas upfront:

<?xml version="1.0"?>
<Document xmlns="http://www.library.org";
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
          xsi:schemaLocation=
                    "http://www.s1.org
                     S1.xsd
                     http://www.s2.org
                     S2.xsd 
                     ...
                     http://www.sn.org
                     Sn.xsd"> 

The disadvantage of this approach is that all the schemas will be loaded at once (eager loading). If there are a lot of schemas this could be slow.


A second approach is to specify a schema at the point where it's first needed. This will enable you to exploit the just-in-time loading capability of schema validators. This is illustrated here:

<?xml version="1.0"?>
<Document xmlns="http://www.library.org";
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
          xsi:schemaLocation=
                    "http://www.s1.org
                     S1.xsd">

    <Element-A>...</Element-A>
    <Element-B>...</Element-B>
    <Element-C xsi:schemaLocation=
                    "http://www.s2.org
                     S2.xsd
        <Element-D>...</Element-D>
        <Element-E>...</Element-E>
        ...
    </Element-C>

S1.xsd will be loaded when the validator hits the <Document> element. S2.xsd won't be loaded until the validator hits the <Element-C> element. And so forth. This approach exploits just-in-time loading of XML Schema documents.

If the XML document is streamed then this approach may yield significant performance savings.


USING COMPILED XML SCHEMAS TO ENHANCE PERFORMANCE

Another technique that may be used to enhance performance is to compile the XML Schema documents and save the compiled version. Then, when you want to validate the XML document, you use the compiled file (rather than loading the XML Schema documents, compiling them, and then validating).

SAXON supports this ability to compile schemas. Michael Kay writes:

    With Saxon, for example, I would advise you to save a 
    .SCM file representing the compiled schema; reloading the schema from a 
    .SCM file should be significantly faster than rebuilding it from source 
    schema documents. 

Rich Salz reports that the DataPower products also compile their files first:

    The DataPower products work this way.  XSLT, XSD, WSDL, XACML, etc., files 
    are compiled to object code the first time they're used (or you can 
    pre-load the object cache). Then when actually "used" the object code is 
    executed directly by the CPU(s).

I do not know if the other schema validators provide the option to compile XML Schemas.




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS