Lists Home |
Date Index |
- To: email@example.com
- Subject: Re: [xml-dev] Normalizing XML [was: XML information modeling best practices]
- From: Ronald Bourret <firstname.lastname@example.org>
- Date: Wed, 01 May 2002 00:31:04 -0700
- References: <000801c1f03e$ad16dd80$6501a8c0@pcukmka>
Michael Kay wrote:
> > First normal form:
> > ------------------
> > Data is in first normal form if it (a) has a primary key and
> > (b) has no repeating fields.
> Actually, *data* can't be in first normal form - only *relations* can.
You're correct -- poor wording on my part.
> applying the concept to a data model that doesn't use relations is pretty
What I'm curious about is whether there is a concept analogous to
normalization that can be applied to the XML data model. This might be
useful in proposing XML information modeling best practices, which was
Simon's original goal.
> Obviously (b) doesn't have any relevance to a hierarchic data model,
That depends on how you choose to interpret what "repeating fields"
means. If you choose to view B in:
<!ELEMENT A (B+)>
as a (pardon the expression) single, multi-valued attribute, then the
above content model has no repeating fields while the following content
<!-- B1 and B2 represent the same real-world entity -->
<!ELEMENT A (B1, B2)>
> > In XML terms, this
> > implies that you only store one "thing" per document
> You've made a magic jump from "data" being normalized to "documents" being
> normalized, and you seem to be assuming that a document should represent one
> tuple in a relation - that's a mighty big jump.
That was my intention from the start, but I obviously wasn't clear about
> Yes, [3NF] does apply to XML, but it certainly doesn't tell us how to split our
> data into multiple documents. It does tell us how to design our hierarchies,
> but not how to partition those hierarchies across documents.
I think it does tell us how to partition documents. It says that data
shared across multiple documents needs to reside in separate documents.
Sales orders are a bad example here, since the "XML normalized" form is
virtually identical to the relational normalized form. Semi-structured
data is probably a better example, since documents containing
semi-structured data are likely to be substantially different than their
> But all this presupposes that we are designing XML documents for storage and
> query. Most XML documents are designed for messaging of some kind (between
> humans or between software components). Within the context of a message,
> duplication is far less of a problem, for example it doesn't matter if I
> hold product code, description, and price as part of each order-line in an
> order. Many XML databases are actually archives of such messages, so
> duplication of data is a fact of life; and since it's an archive, the update
> problem doesn't arise.
This is the conclusion I came to.