XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] 3 approaches to structure lists, plus an analysis of each approach

There is a vast literature on this subject, see for example

http://www.oasis-open.org/committees/sc_home.php?wg_abbrev=ubl-clsc

Michael Kay
http://www.saxonica.com/
 

> -----Original Message-----
> From: Costello, Roger L. [mailto:costello@mitre.org] 
> Sent: 14 February 2009 22:41
> To: 'xml-dev@lists.xml.org'
> Subject: [xml-dev] 3 approaches to structure lists, plus an 
> analysis of each approach
> 
> 
> Hi Folks,
> 
> What are the different approaches to structure lists? What 
> are the pros and cons of each approach? Is there a way to 
> structure lists to maximize their utility and minimize their overhead?
> 
> The purpose of this message is to document and analyze 
> several approaches to structure lists. I use "country list" 
> to illustrate the different approaches.
> 
> ASSERTION: LISTS THAT CAN BE USED FOR MULTIPLE PURPOSES ARE GOOD
> 
> Lists should be structured in a way that they can be used for 
> multiple purposes. For example, a country list may be:
> 
>     - used as values in an XForms pick list.
> 
>     - transformed into a document that contains, for each country, 
>       sales figures (or death rates, births, political leadership, 
>       religions, etc).
> 
>     - used to validate an element's content, e.g. The value of the 
>       <country-visited> element must be a country.
> 
> Those are only a few of the myriad uses of a country list. A 
> well-designed country list should support all of them.
> 
> 
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>          THREE APPROACHES
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
> 
> Below I show three approaches to structure lists. Other 
> approaches are possible, such as comma-separated values.
>  
> I illustrate the three approaches using the country list 
> example and then follow with an analysis of each approach.
> 
> 
> APPROACH #1: Express lists using the XML Schema vocabulary:
> 
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
>            targetNamespace="http://www.countries.org";
>            xmlns="http://www.countries.org";
>            elementFormDefault="qualified">
> 
>     <xs:element name="countries" type="countriesType" />
> 
>     <xs:simpleType name="countriesType">
>         <xs:restriction base="xs:string">
>             <xs:enumeration value="Afghanistan"/>
>             <xs:enumeration value="Albania"/>
>             <xs:enumeration value="Algeria"/>
>             ...
>         </xs:restriction>
>     </xs:simpleType>
> </xs:schema>
> ---------------------------------------------
> 
> 
> APPROACH #2: Express lists using the RELAX NG vocabulary:
> 
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <grammar xmlns="http://relaxng.org/ns/structure/1.0";
>          ns="http://www.countries.org";>
> 
>     <define name="countriesElement">
>         <element name="countries">
>             <ref name="countriesType" />
>         </element>
>     </define>
> 
>     <define name="countriesType">
>         <choice>
>             <value>Afghanistan</value>
>             <value>Albania</value>
>             <value>Algeria</value>
>             ...
>         </choice>
>     </define>
> </grammar>
> ---------------------------------------------
> 
> 
> APPROACH #3: Express lists using domain-specific 
> vocabularies. The markup comes from terminology used by 
> Subject Matter Experts (SMEs):
> 
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <countries xmlns="http://www.countries.org";>
> 
>     <country>Afghanistan</country>
>     <country>Albania</country>
>     <country>Algeria</country>
>     ...
> </countries>
> ---------------------------------------------
> 
> 
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>          ANALYSIS
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> 
> 
> ANALYSIS OF APPROACH #1 AND APPROACH #2
> 
> Approach #1 and approach #2 make it easy to use a list for 
> validation purposes. A schema simply imports the list schema 
> and then its values are immediately available for validating 
> element content. 
> 
> Here is an XML Schema that imports the country list XML 
> Schema and uses its simpleType as the datatype for the 
> <country-visited> element:
> 
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
>            targetNamespace="http://www.example.org";
>            xmlns:c="http://www.countries.org";
>            elementFormDefault="qualified">
> 
>     <xs:import namespace="http://www.countries.org";
>                schemaLocation="countries.xsd" />
> 
>     <xs:element name="country-visited" type="c:countriesType" />
> 
> </xs:schema>
> ---------------------------------------------
> 
> Here is a RELAX NG schema that includes the country list 
> RELAX NG schema and uses its define element as the datatype 
> for the <country-visited> element:
> 
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <grammar xmlns="http://relaxng.org/ns/structure/1.0";
>          ns="http://www.example.org";>
> 
>     <include href="countries.rng"/>
> 
>     <start>
>         <element name="country-visited">
>            <ref name="countriesType" />
>         </element>
>     </start>
> 
> </grammar>
> ---------------------------------------------
> 
> If the schema doing the importing is an XML Schema then it 
> can't use the list if it's expressed using RELAX NG. And vice versa.
> 
> Although these two approaches enable the efficient usage of 
> lists for validation, it's not clear that they are the most 
> efficient format for the myriad other ways that a list may be 
> used (rendering in a pick list, merging with other lists, 
> searching, and so forth). This is discussed further in the 
> below analysis of approach #3.
> 
> 
> ANALYSIS OF APPROACH #3
> 
> Recall that approach #3 uses domain-specific terminology. 
> This can be helpful to Subject Matter Experts (SMEs) as they 
> maintain the lists.
> 
> Validation can be accomplished using a Schematron schema. 
> Here is a Schematron schema which validates that the content 
> of the <country-visited> element matches one of the values in 
> the country list:
> 
> ---------------------------------------------
> <?xml version="1.0"?>
> <sch:schema xmlns:sch="http://www.ascc.net/xml/schematron";>
>    <sch:ns uri="http://www.countries.org";
>            prefix="c" />
> 
>    <sch:pattern name="Country List Check">
> 
>       <sch:rule context="country-visited">
> 
>          <sch:assert test=". = document('countries.xml')//c:country">
>              The value of country-visited must be one of the
>              countries in the countries' list.
>          </sch:assert>
> 
>       </sch:rule>
> 
>    </sch:pattern>
> 
> </sch:schema>
> ---------------------------------------------
> 
> With approach #3 the markup used to construct the list has 
> semantics specific to the list:
> 
> {http://www.countries.org}countries
> {http://www.countries.org}country
> 
> This makes possible the creation of programs that are readily 
> understood, as they use terminology consistent with the 
> domain. For example, this XSLT program uses the country list 
> to generate an HTML list of all countries:
> 
> ---------------------------------------------
> <?xml version="1.0"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>                 xmlns:c="http://www.countries.org";
>                 version="2.0">
>  
>     <xsl:output method="html"/>
> 
>     <xsl:template match="c:countries">
> 
>         <html>
>             <head>
>                 <title>Countries of the World</title>
>             </head>
>             <body>
>                 <ol>
>                     <xsl:apply-templates />
>                 </ol>
>             </body>
>         </html>
> 
>     </xsl:template>
> 
>     <xsl:template match="c:country">
> 
>         <li>
>             <xsl:value-of select="." />
>         </li>
> 
>     </xsl:template>
> 
> </xsl:stylesheet>
> ---------------------------------------------
> 
> Note the template match values. They match on:
> 
> {http://www.countries.org}countries
> {http://www.countries.org}country
>  
> 
> Conversely, with approach #1 and approach #2 the markup used 
> to construct the list has semantics that are specific to the 
> schema language:
> 
> {http://www.w3.org/2001/XMLSchema}element
> {http://www.w3.org/2001/XMLSchema}simpleType
> {http://www.w3.org/2001/XMLSchema}restriction
> {http://www.w3.org/2001/XMLSchema}enumeration
> 
> {http://relaxng.org/ns/structure/1.0}define
> {http://relaxng.org/ns/structure/1.0}choice
> {http://relaxng.org/ns/structure/1.0}value
> 
> Consequently programs must operate using schema terminology 
> rather than domain terminology. For example, this XSLT 
> program generates an HTML list of all countries from the 
> countries list specified by the XML Schema document:
> 
> ---------------------------------------------
> <?xml version="1.0"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>                 xmlns:xs="http://www.w3.org/2001/XMLSchema";
>                 version="2.0">
>  
>     <xsl:output method="html"/>
> 
>     <xsl:template match="xs:simpleType">
> 
>         <html>
>             <head>
>                 <title>Countries of the World</title>
>             </head>
>             <body>
>                 <ol>
>                     <xsl:apply-templates />
>                 </ol>
>             </body>
>         </html>
> 
>     </xsl:template>
> 
>     <xsl:template match="xs:enumeration">
> 
>         <li>
>             <xsl:value-of select="@value" />
>         </li>
> 
>     </xsl:template>
> 
> </xsl:stylesheet>
> ---------------------------------------------
> 
> Note the template match values. Rather than the XSLT program 
> operating on <countries> and <country> elements, it operates 
> on <schema>, <simpleType>, <restriction>, and <enumeration> 
> elements. This makes programming challenging and error-prone.
> 
> With approach #3 a list can be used as a building block (data 
> component) which can be immediately dropped into other 
> documents to create compound documents. For example, consider 
> a list of religions, also formatted using approach #3:
> 
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <religions xmlns="http://www.religions.org";>
> 
>     <religion>Baha'i</religion>
>     <religion>Buddhism</religion>
>     <religion>Catholicism</religion>
>     ...
> 
> </religions>
> ---------------------------------------------
> 
> It is easy to construct a compound document comprised of the 
> country and religion lists:
> 
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <religions-per-country>
>     <countries xmlns="http://www.countries.org";>
>         <country>Afghanistan</country>
>         <country>Albania</country>
>         <country>Algeria</country>
>         ...
>     </countries>
>     <religions xmlns="http://www.religions.org";>
>         <religion>Baha'i</religion>
>         <religion>Buddhism</religion>
>         <religion>Catholicism</religion>
>         ...
>     </religions>
>     <!-- markup that maps religions to countries --> 
> </religions-per-country>
> ---------------------------------------------
> 
> Due to the modularity provided by approach #3, it is possible 
> to perform list-specific processing on this compound 
> document. That is, a country-list-aware application would be 
> able to extract the country list from this compound document 
> and process it. Ditto for a religion-list-aware application.
> 
> With approach #1 and approach #2 the XML vocabulary used to 
> construct the list is the same regardless of the list. Here 
> is the <religions-per-country> document using lists that are 
> defined using the XML Schemas vocabulary: 
> 
> ---------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <religions-per-country>
>     <xs:simpleType xmlns:xs="http://www.w3.org/2001/XMLSchema";
>                    name="countriesType">
>         <xs:restriction base="xs:string">
>             <xs:enumeration value="Afghanistan"/>
>             <xs:enumeration value="Albania"/>
>             <xs:enumeration value="Algeria"/>
>             ...
>         </xs:restriction>
>     </xs:simpleType>
>     <xs:simpleType xmlns:xs="http://www.w3.org/2001/XMLSchema";
>                    name="religionsType">
>         <xs:restriction base="xs:string">
>             <xs:enumeration value="Baha'i"/>
>             <xs:enumeration value="Buddhism"/>
>             <xs:enumeration value="Catholicism"/>
>             ...
>         </xs:restriction>
>     </xs:simpleType>
>     <!-- markup that maps religions to countries --> 
> </religions-per-country>
> ---------------------------------------------
> 
> The namespace used by the country list cannot be 
> distinguished from the namespace used by the religion list. 
> Thus, the benefits namespaces provide in terms of modularity 
> are negated. It is not easy to create country-list-aware 
> applications or religion-list-aware applications.
>  
> Approach #3 has minimal markup overhead.
> 
> 
> ANALYSIS OF ALL APPROACHES
> 
> Regardless of which approach is used, the meaning of the list 
> and its values must be clearly documented. It may be 
> challenging to achieve consensus on meaning:
> 
> - The same terminology may be used by different people to 
> mean the same thing. For example, one person expects to see 
> Puerto Rico in a country list, whereas another person does 
> not. This is because one person defines "country" only as 
> principal sovereignties whereas another person defines 
> "country" to include territories and protectorates. 
> 
> - Further, some people use different terminology to mean the 
> same thing. For example, one person calls it "country" 
> another calls it "principality."
> 
> Thus, with all approaches the issue arises of which 
> terminology and definitions to adopt.
> 
> 
> OTHER FACTORS?
> 
> Above is my initial stab at analyzing the three approaches. 
> Are three other factors of each approach that I have not considered? 
> 
> /Roger
> ______________________________________________________________
> _________
> 
> XML-DEV is a publicly archived, unmoderated list hosted by 
> OASIS to support XML implementation and development. To 
> minimize spam in the archives, you must subscribe before posting.
> 
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org List archive: 
> http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS