OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Normalizing XML [was: XML information modeling best practi

[ Lists Home | Date Index | Thread Index ]
  • To: xml-dev@lists.xml.org
  • Subject: Re: [xml-dev] Normalizing XML [was: XML information modeling best practices]
  • From: Ronald Bourret <rpbourret@rpbourret.com>
  • Date: Wed, 01 May 2002 00:31:04 -0700
  • References: <000801c1f03e$ad16dd80$6501a8c0@pcukmka>

Michael Kay wrote:
> 
> > First normal form:
> > ------------------
> > Data is in first normal form if it (a) has a primary key and
> > (b) has no repeating fields.
> 
> Actually, *data* can't be in first normal form - only *relations* can.

You're correct -- poor wording on my part.

> So
> applying the concept to a data model that doesn't use relations is pretty
> dicey.

Agreed.

What I'm curious about is whether there is a concept analogous to
normalization that can be applied to the XML data model. This might be
useful in proposing XML information modeling best practices, which was
Simon's original goal.

> Obviously (b) doesn't have any relevance to a hierarchic data model,

That depends on how you choose to interpret what "repeating fields"
means. If you choose to view B in:

   <!ELEMENT A (B+)>

as a (pardon the expression) single, multi-valued attribute, then the
above content model has no repeating fields while the following content
model does:

   <!-- B1 and B2 represent the same real-world entity -->
   <!ELEMENT A (B1, B2)>

> > In XML terms, this
> > implies that you only store one "thing" per document
> 
> You've made a magic jump from "data" being normalized to "documents" being
> normalized, and you seem to be assuming that a document should represent one
> tuple in a relation - that's a mighty big jump.

That was my intention from the start, but I obviously wasn't clear about
it.

> Yes, [3NF] does apply to XML, but it certainly doesn't tell us how to split our
> data into multiple documents. It does tell us how to design our hierarchies,
> but not how to partition those hierarchies across documents.

I think it does tell us how to partition documents. It says that data
shared across multiple documents needs to reside in separate documents.
Sales orders are a bad example here, since the "XML normalized" form is
virtually identical to the relational normalized form. Semi-structured
data is probably a better example, since documents containing
semi-structured data are likely to be substantially different than their
relational counterparts.

> But all this presupposes that we are designing XML documents for storage and
> query. Most XML documents are designed for messaging of some kind (between
> humans or between software components). Within the context of a message,
> duplication is far less of a problem, for example it doesn't matter if I
> hold product code, description, and price as part of each order-line in an
> order. Many XML databases are actually archives of such messages, so
> duplication of data is a fact of life; and since it's an archive, the update
> problem doesn't arise.

This is the conclusion I came to.

-- Ron




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS