Creating a single XML vocabulary that is appropriately customized to dif

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Creating a single XML vocabulary that is appropriately customized to different sub-groups within a community

From: "Costello, Roger L." <costello@mitre.org>
To: <xml-dev@lists.xml.org>
Date: Wed, 9 Jul 2008 10:48:11 -0400


Hi Folks,

I frequently encounter the situation of a community wanting to create a
single XML vocabulary, but within the community are sub-groups that
have different perspectives on what data is relevant and needed. Below
is a discussion on how to deal with this situation.  I am interested in
hearing your thoughts on this.  /Roger


ISSUE

How do you create a single XML vocabulary, and validate that XML
vocabulary, for a community that has sub-groups that have overlapping
but different data needs?


EXAMPLE

Consider the book community.  It is comprised of:

   - book sellers
   - book distributors
   - book printers

They have overlapping, but different data needs.

For example, the data needed by a book seller is:

   - the title of the book
   - the author of the book
   - the date of publication
   - the ISBN
   - the publisher

The book distributor has many of the same data needs, but also some
differences:

   - the title of the book
   - the author of the book
   - the size of the book
   - the weight of the book
   - the mailing cost

And the book printer has overlapping but different needs:

   - the size of the book
   - the number of pages

How does the book community deal with such differing needs?


APPROACH #1 - MAKE EVERYTHING OPTIONAL

One approach is to define a schema where everything is optional, e.g.

---------------------------------------------------------
book.rng
---------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<element name="Book" xmlns="http://relaxng.org/ns/structure/1.0";
         ns="http://www.books.org";>
      <optional><element name="Title"><text/></element></optional>
      <optional><element name="Author"><text/></element></optional>
      <optional><element name="Date"><text/></element></optional>
      <optional><element name="ISBN"><text/></element></optional>
      <optional><element name="Publisher"><text/></element></optional>
      <optional><element name="Size"><text/></element></optional>
      <optional><element name="Weight"><text/></element></optional>
      <optional><element
name="MailingCost"><text/></element></optional>
      <optional><element name="NumPages"><text/></element></optional>
</element>

Then, each sub-group in the book community uses just the elements they
need, ignoring the others.  

Thus,

- the book seller creates XML instance documents comprised of Title,
Author, Date, ISBN, and Publisher, e.g.

---------------------------------------------------------
book-seller.xml
---------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Book xmlns="http://www.books.org";>
    <Title>The Wisdom of Crowds</Title>
    <Author>James Surowiecki</Author>
    <Date>2005</Date>
    <ISBN>0-385-72170-6</ISBN>
    <Publisher>Anchor Books</Publisher>
</Book>

- the book distributor creates XML instance documents comprised of
Title, Author, Size, Weight, and MailingCost, e.g.

---------------------------------------------------------
book-distributor.xml
---------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Book xmlns="http://www.books.org";>
    <Title>The Wisdom of Crowds</Title>
    <Author>James Surowiecki</Author>
    <Size>5" x 8"</Size>
    <Weight>15oz</Weight>
    <MailingCost>$3.90</MailingCost>
</Book>

- and the book printer creates XML instance documents comprised of Size
and NumPages, e.g.

---------------------------------------------------------
book-printer.xml
---------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Book xmlns="http://www.books.org";>
    <Size>5" x 8"</Size>
    <NumPages>301</NumPages>
</Book>


DISADVANTAGE

The disadvantage of Approach #1 is that validation is very weak.  For
example, a book seller may accidentally add NumPages to his instance
document:

<?xml version="1.0" encoding="UTF-8"?>
<Book xmlns="http://www.books.org";>
    <Title>The Wisdom of Crowds</Title>
    <Author>James Surowiecki</Author>
    <Date>2005</Date>
    <ISBN>0-385-72170-6</ISBN>
    <Publisher>Anchor Books</Publisher>
    <NumPages>301</NumPages>
</Book>

Validation would not catch this error.


APPROACH #2 - LAYERED VALIDATION

On July 7, 2008 Rick Jelliffe wrote on the xml-dev list:

> start off with a generic and open/extensible schema, and 
> to put version constraints as another layer (you guessed
it...Schematron).

Yes!

That's it Rick!

I will use the generic grammar-based schema above, and then add a
Schematron business-rules layer on top to constrain it appropriately.

Here is the Schematron schema that applies the constraints needed by
the book seller:

---------------------------------------------------------
book-seller.sch
---------------------------------------------------------
<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron";>

   <sch:ns uri="http://www.books.org";
           prefix="bk" />

   <sch:pattern name="Book Sellers">

      <sch:p>The book data required for a seller is 
             title, author, date, ISBN, and publisher.</sch:p> 

      <sch:rule context="bk:Book">

         <sch:assert test="count(bk:Title) = 1 and
                           count(bk:Author) = 1 and
                           count(bk:Date) = 1 and
                           count(bk:ISBN) = 1 and
                           count(bk:Publisher) = 1 and
                           count(*[not(self::bk:Title or 
                                       self::bk:Author or 
                                       self::bk:Date or 
                                       self::bk:ISBN or 
                                       self::bk:Publisher)]) = 0">
             The book data required for a seller is 
             title, author, date, ISBN, and publisher.
         </sch:assert>

      </sch:rule>

   </sch:pattern>

</sch:schema>

Here is the Schematron schema that applies the constraints needed by
the book distributor:

---------------------------------------------------------
book-distributor.sch
---------------------------------------------------------
<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron";>

   <sch:ns uri="http://www.books.org";
           prefix="bk" />

   <sch:pattern name="Book Distributors">

      <sch:p>The book data required for a distributor is 
             title, author, size, weight, and mailing cost.</sch:p> 

      <sch:rule context="bk:Book">

         <sch:assert test="count(bk:Title) = 1 and
                           count(bk:Author) = 1 and
                           count(bk:Size) = 1 and
                           count(bk:Weight) = 1 and
                           count(bk:MailingCost) = 1 and
                           count(*[not(self::bk:Title or 
                                       self::bk:Author or 
                                       self::bk:Size or 
                                       self::bk:Weight or 
                                       self::bk:MailingCost)]) = 0">
             The book data required for a seller is 
             title, author, size, weight, and mailing cost.
         </sch:assert>

      </sch:rule>

   </sch:pattern>

</sch:schema>

And here is the Schematron schema that applies the constraints needed
by the book printer:

---------------------------------------------------------
book-printer.sch
---------------------------------------------------------
<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron";>

   <sch:ns uri="http://www.books.org";
           prefix="bk" />

   <sch:pattern name="Book Distributors">

      <sch:p>The book data required for a printer is 
             the size and number of pages.</sch:p> 

      <sch:rule context="bk:Book">

         <sch:assert test="count(bk:Size) = 1 and
                           count(bk:NumPages) = 1 and
                           count(*[not(self::bk:Size or 
                                       self::bk:NumPages)]) = 0">
             The book data required for a printer is 
             the size and number of pages.
         </sch:assert>

      </sch:rule>

   </sch:pattern>

</sch:schema>

ADVANTAGES

Now we have strong validation. If a book seller accidentally adds
NumPages, the error will be caught.

This approach separates the definition of the community's XML
vocabulary from the constraints needed by each sub-group within the
community.  There is a nice separation of concerns.  New Schematron
rules can be added to support new data business needs.  The grammar
schema that defines the XML vocabulary - book.rng - is simple and easy
to maintain.


CONCURRENT VALIDATION

Now, to tie things together, what is needed is to validate an XML
instance document against the grammar-based schema plus the appropriate
Schematron schema.  This "concurrent validation" is nicely accomplished
using NVDL.

---------------------------------------------------------
book-seller.nvdl
--------------------------------------------------------- 
<?xml version="1.0"?>
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0";>

   <namespace ns="http://www.books.org";>
     <validate schema="book.rng" />
     <validate schema="book-seller.sch" />
   </namespace>

</rules>

---------------------------------------------------------
book-distributor.nvdl
--------------------------------------------------------- 
<?xml version="1.0"?>
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0";>

   <namespace ns="http://www.books.org";>
     <validate schema="book.rng" />
     <validate schema="book-distributor.sch" />
   </namespace>

</rules>

---------------------------------------------------------
book-printer.nvdl
--------------------------------------------------------- 
<?xml version="1.0"?>
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0";>

   <namespace ns="http://www.books.org";>
     <validate schema="book.rng" />
     <validate schema="book-printer.sch" />
   </namespace>

</rules>


SUMMARY

The above discussion illustrates how a community can create a single
XML vocabulary that can be appropriately customized to the needs of
differing sub-groups within the community.  The approach used is a
layering approach.  A simple grammar-based schema defines the XML
vocabulary.  Schematron rules are defined to constrain the XML
vocabulary in a way appropriate to each sub-group within the community.
And NVDL is used to tie together the grammar-based schema with the
Schematron schema.

Follow-Ups:
- [Summary] Creating a single XML vocabulary that is appropriately customized to different sub-groups within a community
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] Creating a single XML vocabulary that is appropriatelycustomized to different sub-groups within a community
  - From: Philippe Poulard <philippe.poulard@sophia.inria.fr>
- Re: [xml-dev] Creating a single XML vocabulary that is appropriatelycustomized to different sub-groups within a community
  - From: Philippe Poulard <philippe.poulard@sophia.inria.fr>
- Re: [xml-dev] Creating a single XML vocabulary that is appropriatelycustomized to different sub-groups within a community
  - From: Philippe Poulard <philippe.poulard@sophia.inria.fr>
- Re: [xml-dev] Creating a single XML vocabulary that is appropriately customized to different sub-groups within a community
  - From: "Andrew Welch" <andrew.j.welch@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]