[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Creating a single XML vocabulary that is appropriately customized to different sub-groups within a community
- From: "Costello, Roger L." <costello@mitre.org>
- To: <xml-dev@lists.xml.org>
- Date: Wed, 9 Jul 2008 10:48:11 -0400
Hi Folks,
I frequently encounter the situation of a community wanting to create a
single XML vocabulary, but within the community are sub-groups that
have different perspectives on what data is relevant and needed. Below
is a discussion on how to deal with this situation. I am interested in
hearing your thoughts on this. /Roger
ISSUE
How do you create a single XML vocabulary, and validate that XML
vocabulary, for a community that has sub-groups that have overlapping
but different data needs?
EXAMPLE
Consider the book community. It is comprised of:
- book sellers
- book distributors
- book printers
They have overlapping, but different data needs.
For example, the data needed by a book seller is:
- the title of the book
- the author of the book
- the date of publication
- the ISBN
- the publisher
The book distributor has many of the same data needs, but also some
differences:
- the title of the book
- the author of the book
- the size of the book
- the weight of the book
- the mailing cost
And the book printer has overlapping but different needs:
- the size of the book
- the number of pages
How does the book community deal with such differing needs?
APPROACH #1 - MAKE EVERYTHING OPTIONAL
One approach is to define a schema where everything is optional, e.g.
---------------------------------------------------------
book.rng
---------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<element name="Book" xmlns="http://relaxng.org/ns/structure/1.0"
ns="http://www.books.org">
<optional><element name="Title"><text/></element></optional>
<optional><element name="Author"><text/></element></optional>
<optional><element name="Date"><text/></element></optional>
<optional><element name="ISBN"><text/></element></optional>
<optional><element name="Publisher"><text/></element></optional>
<optional><element name="Size"><text/></element></optional>
<optional><element name="Weight"><text/></element></optional>
<optional><element
name="MailingCost"><text/></element></optional>
<optional><element name="NumPages"><text/></element></optional>
</element>
Then, each sub-group in the book community uses just the elements they
need, ignoring the others.
Thus,
- the book seller creates XML instance documents comprised of Title,
Author, Date, ISBN, and Publisher, e.g.
---------------------------------------------------------
book-seller.xml
---------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Book xmlns="http://www.books.org">
<Title>The Wisdom of Crowds</Title>
<Author>James Surowiecki</Author>
<Date>2005</Date>
<ISBN>0-385-72170-6</ISBN>
<Publisher>Anchor Books</Publisher>
</Book>
- the book distributor creates XML instance documents comprised of
Title, Author, Size, Weight, and MailingCost, e.g.
---------------------------------------------------------
book-distributor.xml
---------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Book xmlns="http://www.books.org">
<Title>The Wisdom of Crowds</Title>
<Author>James Surowiecki</Author>
<Size>5" x 8"</Size>
<Weight>15oz</Weight>
<MailingCost>$3.90</MailingCost>
</Book>
- and the book printer creates XML instance documents comprised of Size
and NumPages, e.g.
---------------------------------------------------------
book-printer.xml
---------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Book xmlns="http://www.books.org">
<Size>5" x 8"</Size>
<NumPages>301</NumPages>
</Book>
DISADVANTAGE
The disadvantage of Approach #1 is that validation is very weak. For
example, a book seller may accidentally add NumPages to his instance
document:
<?xml version="1.0" encoding="UTF-8"?>
<Book xmlns="http://www.books.org">
<Title>The Wisdom of Crowds</Title>
<Author>James Surowiecki</Author>
<Date>2005</Date>
<ISBN>0-385-72170-6</ISBN>
<Publisher>Anchor Books</Publisher>
<NumPages>301</NumPages>
</Book>
Validation would not catch this error.
APPROACH #2 - LAYERED VALIDATION
On July 7, 2008 Rick Jelliffe wrote on the xml-dev list:
> start off with a generic and open/extensible schema, and
> to put version constraints as another layer (you guessed
it...Schematron).
Yes!
That's it Rick!
I will use the generic grammar-based schema above, and then add a
Schematron business-rules layer on top to constrain it appropriately.
Here is the Schematron schema that applies the constraints needed by
the book seller:
---------------------------------------------------------
book-seller.sch
---------------------------------------------------------
<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:ns uri="http://www.books.org"
prefix="bk" />
<sch:pattern name="Book Sellers">
<sch:p>The book data required for a seller is
title, author, date, ISBN, and publisher.</sch:p>
<sch:rule context="bk:Book">
<sch:assert test="count(bk:Title) = 1 and
count(bk:Author) = 1 and
count(bk:Date) = 1 and
count(bk:ISBN) = 1 and
count(bk:Publisher) = 1 and
count(*[not(self::bk:Title or
self::bk:Author or
self::bk:Date or
self::bk:ISBN or
self::bk:Publisher)]) = 0">
The book data required for a seller is
title, author, date, ISBN, and publisher.
</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
Here is the Schematron schema that applies the constraints needed by
the book distributor:
---------------------------------------------------------
book-distributor.sch
---------------------------------------------------------
<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:ns uri="http://www.books.org"
prefix="bk" />
<sch:pattern name="Book Distributors">
<sch:p>The book data required for a distributor is
title, author, size, weight, and mailing cost.</sch:p>
<sch:rule context="bk:Book">
<sch:assert test="count(bk:Title) = 1 and
count(bk:Author) = 1 and
count(bk:Size) = 1 and
count(bk:Weight) = 1 and
count(bk:MailingCost) = 1 and
count(*[not(self::bk:Title or
self::bk:Author or
self::bk:Size or
self::bk:Weight or
self::bk:MailingCost)]) = 0">
The book data required for a seller is
title, author, size, weight, and mailing cost.
</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
And here is the Schematron schema that applies the constraints needed
by the book printer:
---------------------------------------------------------
book-printer.sch
---------------------------------------------------------
<?xml version="1.0"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:ns uri="http://www.books.org"
prefix="bk" />
<sch:pattern name="Book Distributors">
<sch:p>The book data required for a printer is
the size and number of pages.</sch:p>
<sch:rule context="bk:Book">
<sch:assert test="count(bk:Size) = 1 and
count(bk:NumPages) = 1 and
count(*[not(self::bk:Size or
self::bk:NumPages)]) = 0">
The book data required for a printer is
the size and number of pages.
</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
ADVANTAGES
Now we have strong validation. If a book seller accidentally adds
NumPages, the error will be caught.
This approach separates the definition of the community's XML
vocabulary from the constraints needed by each sub-group within the
community. There is a nice separation of concerns. New Schematron
rules can be added to support new data business needs. The grammar
schema that defines the XML vocabulary - book.rng - is simple and easy
to maintain.
CONCURRENT VALIDATION
Now, to tie things together, what is needed is to validate an XML
instance document against the grammar-based schema plus the appropriate
Schematron schema. This "concurrent validation" is nicely accomplished
using NVDL.
---------------------------------------------------------
book-seller.nvdl
---------------------------------------------------------
<?xml version="1.0"?>
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0">
<namespace ns="http://www.books.org">
<validate schema="book.rng" />
<validate schema="book-seller.sch" />
</namespace>
</rules>
---------------------------------------------------------
book-distributor.nvdl
---------------------------------------------------------
<?xml version="1.0"?>
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0">
<namespace ns="http://www.books.org">
<validate schema="book.rng" />
<validate schema="book-distributor.sch" />
</namespace>
</rules>
---------------------------------------------------------
book-printer.nvdl
---------------------------------------------------------
<?xml version="1.0"?>
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0">
<namespace ns="http://www.books.org">
<validate schema="book.rng" />
<validate schema="book-printer.sch" />
</namespace>
</rules>
SUMMARY
The above discussion illustrates how a community can create a single
XML vocabulary that can be appropriately customized to the needs of
differing sub-groups within the community. The approach used is a
layering approach. A simple grammar-based schema defines the XML
vocabulary. Schematron rules are defined to constrain the XML
vocabulary in a way appropriate to each sub-group within the community.
And NVDL is used to tie together the grammar-based schema with the
Schematron schema.
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]