Oops! A colleague found a couple bugs in my algorithm. I should use sets not lists, and I forgot to update the ExaminedSet. Do you see any bugs in this updated algorithm?

Below is an algorithm for traversing all the schemas. It utilizes two sets:

a. ToExamineSet: this set contains all the schemas that are currently scheduled to be examined (visited/processed). It is initialized with the "main" schema.

b. ExaminedSet: this set contains all the schemas that have already been examined (visited/processed). It is initialized to the empty set.

ToExamineSet ← {main schema};
ExaminedSet ← {};
while (ToExamineSet ≠ ∅) do
      remove schema s from ToExamineSet;
      add s to ExaminedSet;
      for each include/import i in s do
            t ← dereference (i);
            if t ∉ ExaminedSet then
                  add t to ToExamineSet;

/Roger

From: Costello, Roger L.
Sent: Friday, May 15, 2015 3:03 PM
To: xml-dev@lists.xml.org
Subject: A prose description of traversing all the schemas? An algorithm for how to traverse all the schemas?

Hi Folks,

Recall that an XML Schema can reference other schemas, using the include and import elements. And those schemas can then reference other schemas. And so forth.

I am writing a description of this. Specifically, I am writing two things:

1. A prose description of traversing all the schemas.

2. An algorithm for how to traverse all the schemas.

The prose description must be technically correct, but also understandable by a non-technical person.

I made a go at writing these things. See below. Can you think of a better way to write the prose? Do you see any errors in my algorithm?

The first thing that I realized is that traversing a schema and its imported/included schemas is a graph-traversal problem! For example, this diagram nicely shows that schemas can form a graph:

See the bottom of this message for the actual schemas.

This was a key insight for me, as it means that I can leverage all the graph traversing algorithms that have been developed over the past 50 years.

1. A Prose Description of Traversing all the Schemas

This is the prose description that I initially came up with:

Start with the “main” schema and then follow its include and import elements, recursively gobbling up all schemas.

I rejected that. It is terribly ambiguous: What does "gobbling up" mean? What will "recursively" mean to a non-technical person?

Then I came up with this description:

The main schema is the initial schema to be examined.

For each schema to be examined, examine its content for include/import elements. The schemas pointed to by each include/import element are then considered schemas to be examined.

If a schema is encountered a second time, it is not examined again.

I think that's pretty good. A technical person will notice the massively recursive nature of that description. A non-technical person will probably not realize all that is implied and will simply ignore it.

Although I am pretty pleased with that description, I have no doubt that it can be improved. I look forward to your improvements.

2. An Algorithm for how to Traverse all the Schemas

Below is an algorithm for traversing all the schemas. It utilizes two lists:

a. ToExamineList: this list contains all the schemas that are currently scheduled to be examined (visited/processed). It is initialized with the "main" schema.

b. ExaminedList: this list contains all the schemas that have already been examined (visited/processed). It is initialized to the empty list (i.e., empty set).

ToExamineList ← main schema;
ExaminedList ← {}
while (ToExamineList ≠ ∅) do
      remove schema s from ToExamineList;
      for each include/import i in s do
            t ← dereference (i);
            if t ∉ ExaminedList then
                  add t to ToExamineList

Do you see any errors in that algorithm? Is there a better (i.e., simpler) algorithm?

XML Schema Example

Above I promised to show the XML Schemas for the example. Here they are:

A.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:include schemaLocation="B.xsd" />
    <xs:include schemaLocation="C.xsd" />

    <xs:element name="A">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="B" />
                <xs:element ref="C" />
            </xs:sequence>
       </xs:complexType>
    </xs:element>

</xs:schema>

B.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:include schemaLocation="D.xsd" />

    <xs:element name="B">
        <xs:complexType>
            <xs:sequence>
               <xs:element ref="D" />
            </xs:sequence>
        </xs:complexType>
    </xs:element>

</xs:schema>

C.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:include schemaLocation="D.xsd" />

    <xs:element name="C">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="D" />
            </xs:sequence>
        </xs:complexType>
    </xs:element>

</xs:schema>

D.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:include schemaLocation="A.xsd" />

    <xs:element name="D">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="A" minOccurs="0" />
            </xs:sequence>
        </xs:complexType>
    </xs:element>

</xs:schema>

And here is a schema-valid instance document: