[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] 3 approaches to structure lists, plus an analysis of each approach
- From: "Michael Kay" <mike@saxonica.com>
- To: "'Costello, Roger L.'" <costello@mitre.org>,<xml-dev@lists.xml.org>
- Date: Sat, 14 Feb 2009 23:04:17 -0000
There is a vast literature on this subject, see for example
http://www.oasis-open.org/committees/sc_home.php?wg_abbrev=ubl-clsc
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Costello, Roger L. [mailto:costello@mitre.org]
> Sent: 14 February 2009 22:41
> To: 'xml-dev@lists.xml.org'
> Subject: [xml-dev] 3 approaches to structure lists, plus an
> analysis of each approach
>
>
> Hi Folks,
>
> What are the different approaches to structure lists? What
> are the pros and cons of each approach? Is there a way to
> structure lists to maximize their utility and minimize their overhead?
>
> The purpose of this message is to document and analyze
> several approaches to structure lists. I use "country list"
> to illustrate the different approaches.
>
> ASSERTION: LISTS THAT CAN BE USED FOR MULTIPLE PURPOSES ARE GOOD
>
> Lists should be structured in a way that they can be used for
> multiple purposes. For example, a country list may be:
>
> - used as values in an XForms pick list.
>
> - transformed into a document that contains, for each country,
> sales figures (or death rates, births, political leadership,
> religions, etc).
>
> - used to validate an element's content, e.g. The value of the
> <country-visited> element must be a country.
>
> Those are only a few of the myriad uses of a country list. A
> well-designed country list should support all of them.
>
>
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> THREE APPROACHES
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> Below I show three approaches to structure lists. Other
> approaches are possible, such as comma-separated values.
>
> I illustrate the three approaches using the country list
> example and then follow with an analysis of each approach.
>
>
> APPROACH #1: Express lists using the XML Schema vocabulary:
>
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
> targetNamespace="http://www.countries.org"
> xmlns="http://www.countries.org"
> elementFormDefault="qualified">
>
> <xs:element name="countries" type="countriesType" />
>
> <xs:simpleType name="countriesType">
> <xs:restriction base="xs:string">
> <xs:enumeration value="Afghanistan"/>
> <xs:enumeration value="Albania"/>
> <xs:enumeration value="Algeria"/>
> ...
> </xs:restriction>
> </xs:simpleType>
> </xs:schema>
> ---------------------------------------------
>
>
> APPROACH #2: Express lists using the RELAX NG vocabulary:
>
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <grammar xmlns="http://relaxng.org/ns/structure/1.0"
> ns="http://www.countries.org">
>
> <define name="countriesElement">
> <element name="countries">
> <ref name="countriesType" />
> </element>
> </define>
>
> <define name="countriesType">
> <choice>
> <value>Afghanistan</value>
> <value>Albania</value>
> <value>Algeria</value>
> ...
> </choice>
> </define>
> </grammar>
> ---------------------------------------------
>
>
> APPROACH #3: Express lists using domain-specific
> vocabularies. The markup comes from terminology used by
> Subject Matter Experts (SMEs):
>
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <countries xmlns="http://www.countries.org">
>
> <country>Afghanistan</country>
> <country>Albania</country>
> <country>Algeria</country>
> ...
> </countries>
> ---------------------------------------------
>
>
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> ANALYSIS
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
>
> ANALYSIS OF APPROACH #1 AND APPROACH #2
>
> Approach #1 and approach #2 make it easy to use a list for
> validation purposes. A schema simply imports the list schema
> and then its values are immediately available for validating
> element content.
>
> Here is an XML Schema that imports the country list XML
> Schema and uses its simpleType as the datatype for the
> <country-visited> element:
>
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
> targetNamespace="http://www.example.org"
> xmlns:c="http://www.countries.org"
> elementFormDefault="qualified">
>
> <xs:import namespace="http://www.countries.org"
> schemaLocation="countries.xsd" />
>
> <xs:element name="country-visited" type="c:countriesType" />
>
> </xs:schema>
> ---------------------------------------------
>
> Here is a RELAX NG schema that includes the country list
> RELAX NG schema and uses its define element as the datatype
> for the <country-visited> element:
>
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <grammar xmlns="http://relaxng.org/ns/structure/1.0"
> ns="http://www.example.org">
>
> <include href="countries.rng"/>
>
> <start>
> <element name="country-visited">
> <ref name="countriesType" />
> </element>
> </start>
>
> </grammar>
> ---------------------------------------------
>
> If the schema doing the importing is an XML Schema then it
> can't use the list if it's expressed using RELAX NG. And vice versa.
>
> Although these two approaches enable the efficient usage of
> lists for validation, it's not clear that they are the most
> efficient format for the myriad other ways that a list may be
> used (rendering in a pick list, merging with other lists,
> searching, and so forth). This is discussed further in the
> below analysis of approach #3.
>
>
> ANALYSIS OF APPROACH #3
>
> Recall that approach #3 uses domain-specific terminology.
> This can be helpful to Subject Matter Experts (SMEs) as they
> maintain the lists.
>
> Validation can be accomplished using a Schematron schema.
> Here is a Schematron schema which validates that the content
> of the <country-visited> element matches one of the values in
> the country list:
>
> ---------------------------------------------
> <?xml version="1.0"?>
> <sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
> <sch:ns uri="http://www.countries.org"
> prefix="c" />
>
> <sch:pattern name="Country List Check">
>
> <sch:rule context="country-visited">
>
> <sch:assert test=". = document('countries.xml')//c:country">
> The value of country-visited must be one of the
> countries in the countries' list.
> </sch:assert>
>
> </sch:rule>
>
> </sch:pattern>
>
> </sch:schema>
> ---------------------------------------------
>
> With approach #3 the markup used to construct the list has
> semantics specific to the list:
>
> {http://www.countries.org}countries
> {http://www.countries.org}country
>
> This makes possible the creation of programs that are readily
> understood, as they use terminology consistent with the
> domain. For example, this XSLT program uses the country list
> to generate an HTML list of all countries:
>
> ---------------------------------------------
> <?xml version="1.0"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> xmlns:c="http://www.countries.org"
> version="2.0">
>
> <xsl:output method="html"/>
>
> <xsl:template match="c:countries">
>
> <html>
> <head>
> <title>Countries of the World</title>
> </head>
> <body>
> <ol>
> <xsl:apply-templates />
> </ol>
> </body>
> </html>
>
> </xsl:template>
>
> <xsl:template match="c:country">
>
> <li>
> <xsl:value-of select="." />
> </li>
>
> </xsl:template>
>
> </xsl:stylesheet>
> ---------------------------------------------
>
> Note the template match values. They match on:
>
> {http://www.countries.org}countries
> {http://www.countries.org}country
>
>
> Conversely, with approach #1 and approach #2 the markup used
> to construct the list has semantics that are specific to the
> schema language:
>
> {http://www.w3.org/2001/XMLSchema}element
> {http://www.w3.org/2001/XMLSchema}simpleType
> {http://www.w3.org/2001/XMLSchema}restriction
> {http://www.w3.org/2001/XMLSchema}enumeration
>
> {http://relaxng.org/ns/structure/1.0}define
> {http://relaxng.org/ns/structure/1.0}choice
> {http://relaxng.org/ns/structure/1.0}value
>
> Consequently programs must operate using schema terminology
> rather than domain terminology. For example, this XSLT
> program generates an HTML list of all countries from the
> countries list specified by the XML Schema document:
>
> ---------------------------------------------
> <?xml version="1.0"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> xmlns:xs="http://www.w3.org/2001/XMLSchema"
> version="2.0">
>
> <xsl:output method="html"/>
>
> <xsl:template match="xs:simpleType">
>
> <html>
> <head>
> <title>Countries of the World</title>
> </head>
> <body>
> <ol>
> <xsl:apply-templates />
> </ol>
> </body>
> </html>
>
> </xsl:template>
>
> <xsl:template match="xs:enumeration">
>
> <li>
> <xsl:value-of select="@value" />
> </li>
>
> </xsl:template>
>
> </xsl:stylesheet>
> ---------------------------------------------
>
> Note the template match values. Rather than the XSLT program
> operating on <countries> and <country> elements, it operates
> on <schema>, <simpleType>, <restriction>, and <enumeration>
> elements. This makes programming challenging and error-prone.
>
> With approach #3 a list can be used as a building block (data
> component) which can be immediately dropped into other
> documents to create compound documents. For example, consider
> a list of religions, also formatted using approach #3:
>
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <religions xmlns="http://www.religions.org">
>
> <religion>Baha'i</religion>
> <religion>Buddhism</religion>
> <religion>Catholicism</religion>
> ...
>
> </religions>
> ---------------------------------------------
>
> It is easy to construct a compound document comprised of the
> country and religion lists:
>
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <religions-per-country>
> <countries xmlns="http://www.countries.org">
> <country>Afghanistan</country>
> <country>Albania</country>
> <country>Algeria</country>
> ...
> </countries>
> <religions xmlns="http://www.religions.org">
> <religion>Baha'i</religion>
> <religion>Buddhism</religion>
> <religion>Catholicism</religion>
> ...
> </religions>
> <!-- markup that maps religions to countries -->
> </religions-per-country>
> ---------------------------------------------
>
> Due to the modularity provided by approach #3, it is possible
> to perform list-specific processing on this compound
> document. That is, a country-list-aware application would be
> able to extract the country list from this compound document
> and process it. Ditto for a religion-list-aware application.
>
> With approach #1 and approach #2 the XML vocabulary used to
> construct the list is the same regardless of the list. Here
> is the <religions-per-country> document using lists that are
> defined using the XML Schemas vocabulary:
>
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <religions-per-country>
> <xs:simpleType xmlns:xs="http://www.w3.org/2001/XMLSchema"
> name="countriesType">
> <xs:restriction base="xs:string">
> <xs:enumeration value="Afghanistan"/>
> <xs:enumeration value="Albania"/>
> <xs:enumeration value="Algeria"/>
> ...
> </xs:restriction>
> </xs:simpleType>
> <xs:simpleType xmlns:xs="http://www.w3.org/2001/XMLSchema"
> name="religionsType">
> <xs:restriction base="xs:string">
> <xs:enumeration value="Baha'i"/>
> <xs:enumeration value="Buddhism"/>
> <xs:enumeration value="Catholicism"/>
> ...
> </xs:restriction>
> </xs:simpleType>
> <!-- markup that maps religions to countries -->
> </religions-per-country>
> ---------------------------------------------
>
> The namespace used by the country list cannot be
> distinguished from the namespace used by the religion list.
> Thus, the benefits namespaces provide in terms of modularity
> are negated. It is not easy to create country-list-aware
> applications or religion-list-aware applications.
>
> Approach #3 has minimal markup overhead.
>
>
> ANALYSIS OF ALL APPROACHES
>
> Regardless of which approach is used, the meaning of the list
> and its values must be clearly documented. It may be
> challenging to achieve consensus on meaning:
>
> - The same terminology may be used by different people to
> mean the same thing. For example, one person expects to see
> Puerto Rico in a country list, whereas another person does
> not. This is because one person defines "country" only as
> principal sovereignties whereas another person defines
> "country" to include territories and protectorates.
>
> - Further, some people use different terminology to mean the
> same thing. For example, one person calls it "country"
> another calls it "principality."
>
> Thus, with all approaches the issue arises of which
> terminology and definitions to adopt.
>
>
> OTHER FACTORS?
>
> Above is my initial stab at analyzing the three approaches.
> Are three other factors of each approach that I have not considered?
>
> /Roger
> ______________________________________________________________
> _________
>
> XML-DEV is a publicly archived, unmoderated list hosted by
> OASIS to support XML implementation and development. To
> minimize spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org List archive:
> http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]