XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] XML Design: a series of an item or a collection ofspecific types of items?

Roger,

If people enter the info manually, a likely problem is someone
messing up the content of your Type or Age-Group elements.  It
could be everyday typos, or maybe a misunderstanding of the
meaning of Type (such as paperback, hardback, PDF, ePub, HTML,
etc.).  You could avoid this problem if you can forbid the use
of a DTD, and are sure that only validation tools that can limit
the possible content of the Type and Age-Group elements are
used.  Also, I would use "Subject" instead of "Field".

To allow the use of a DTD as well as other schema languages I
would...

1. make Subject an attribute and add "fiction" to "math,
   chemistry, physics, or astronomy"; and

2. make Age-Group an attribute and add "nil" to "teen, young
   adult, adult, or all" while hyphenating young-adult.

Resulting in...

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE Books [
<!ELEMENT Books (Book)+ >
<!ATTLIST Book
  Subject (fiction|math|chemistry|physics|astronomy) #REQUIRED
  Age-Group (teen|young-adult|adult|all|nil) "nil"
>
<!ELEMENT Book      (Title, Author) >
<!ELEMENT Title     (#PCDATA) >
<!ELEMENT Author    (#PCDATA) >
]>
<Books>
    <Book Subject='astronomy'>
        <Title>Cosmos</Title>
        <Author>Carl Sagan</Author>
    </Book>
    <Book Subject='fiction' Age-Group='all'>
        <Title>The Alchemist</Title>
        <Author>Paulo Choelho</Author>
    </Book>
</Books>

As a DTD this can't enforce that all fiction books have an
Age-Group value other than nil, but you wrote "If a book is
fiction, I have data about the age-group for which it is
intended: teen, young adult, adult, or all."  To me that means
that we don't need to be concerned about cases of

    <Book Subject='fiction' Age-Group='nil'>.

If we ever did need to be concerned about it, it's easy to match
that combo for rejection, deletion, or correction.


--Ernest


On Tue, 13 Jan 2015, Costello, Roger L. wrote:

> Date: Tue, 13 Jan 2015 05:50:55
> From: "Costello, Roger L." <costello@mitre.org>
> To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
> Subject: [xml-dev] XML Design: a series of an item or a collection of specific
>      types of items?
> 
> Hi Folks,
> 
> I have data about Books. 
> 
> Some books are fiction and some are non-fiction.
> 
> Regardless of whether a book is fiction or non-fiction, I have title and author information.
> 
> If a book is fiction, I have data about the age-group for which it is intended: teen, young adult, adult, or all.
> 
> If a book is non-fiction, I have data about its field: math, chemistry, physics, or astronomy.
> 
> Below are two designs. When would one design be preferred over the other? What factors would push you toward adopting one design over the other?
> 
> Design #1 - Series of Book Elements
> 
> Here is an example to illustrate this design:
> 
> <Books>
>     <Book>
>         <Type>Non-Fiction</Type>
>         <Field>Astronomy</Field>
>         <Title>Cosmos</Title>
>         <Author>Carl Sagan</Author>
>     </Book>
>     <Book>
>         <Type>Fiction</Type>
>         <Age-Group>All</Age-Group>
>         <Title>The Alchemist</Title>
>         <Author>Paulo Choelho</Author>
>     </Book>
> </Books>
> 
> There is a series of <Book> elements. The <Type> element identifies the kind of book. The data that is common to all books - Type, Title, and Author - is included in each Book element. The data that is unique to non-fiction books - Field - is only included in the non-fiction Book element.  The data that is unique to fiction books - Age-Group - is only included in the fiction Book element.  
> 
> A grammar for this design specifies that Books contain any number of Book elements:
> 
> Books --> Book+
> 
> The grammar rule for Book mandates the common elements - Type, Title, and Author - and makes the type-specific elements - Field and Age-Group - optional:
> 
> Book --> Type, Field?, Age-Group?, Title, Author
> 
> An unfortunate aspect of this design is that someone creating an XML instance document could accidentally create a non-fiction book that includes the Age-Group element. We could use Schematron to prevent this.
> 
> The beauty of this design is that a query for Books will return all the Books. If, in the future there are also, say, books of type History and Philosophy then the query will still work. Thus, this design is extensible, at least from a query perspective.
> 
> (Personally, I like the simple, repetitive nature of this design. And I am not concerned about the need for adding an additional layer of Schematron validation because I typically supplement grammar-based validation with Schematron co-constraint validation.)
> 
> Design #2 - Collection of Fiction and Non-Fiction Elements
> 
> Here is an example to illustrate this design:
> 
> <Books>
>     <Non-Fiction>
>         <Field>Astronomy</Field>
>         <Title>Cosmos</Title>
>         <Author>Carl Sagan</Author>
>     </Non-Fiction>
>     <Fiction>
>         <Age-Group>All</Age-Group>
>         <Title>The Alchemist</Title>
>         <Author>Paulo Choelho</Author>
>     </Fiction>
> </Books>
> 
> The content of the Books element is a repeatable choice of either a Non-Fiction element or a Fiction element. 
> 
> A grammar for this design specifies that Books contains a repeatable choice of either a Non-Fiction element or a Fiction element:
> 
> Books --> (Non-Fiction | Fiction)+
> 
> The grammar rule for Non-Fiction mandates the common elements - Title and Author - and mandates its type-specific element - Field:
> 
> Non-Fiction --> Field, Title, Author
> 
> The grammar rule for Fiction mandates the common elements - Title and Author - and mandates its type-specific element - Age-Group:
> 
> Fiction --> Age-Group, Title, Author
> 
> An unfortunate aspect of this design is that querying for all Books might not be easy, especially if there are, say, Magazine and CD elements mixed along with the Fiction and Non-Fiction elements. The query would have to call out each type of book: "Give me all Non-Fiction and Fiction elements." If, in the future there are also, say, History and Philosophy elements then the query will have to be modified. Thus, this design is not extensible, at least from a query perspective.
> 
> The beauty of this design is that there is no concern for someone accidentally creating a non-fiction book that includes the Age-Group element.
> 
> (Personally, I don't like this design. The non-repetitiveness is bothersome to me. And it requires additional grammar rules, which makes things more complicated.)
> --------------------------------
> 
> Okay, I'd like to hear your thoughts. From your experience, how do the two designs impact querying? Application processing? Schematron development? Etc.? Which design would you choose? Why?
> 
> /Roger
> 
> 
> _______________________________________________________________________
> 
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
> 
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> 
> 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS