OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] 3 XML Design Principles

[ Lists Home | Date Index | Thread Index ]


I think to a certain extent that you're applying Ted Cobb's "12 rules
of database design" to XML - in essence, what you've done here is
given in XML a number of key normalization rules. The challenge that I
find when attempting to do is the fact that such normalization can
only occur in situations where there is what I term a low complexity
to the schema (I address this in a distinctly un-user-friendly paper
called <a href="http://www.understandingxml.com/archives/2005/01/information_los.html";>Information
Loss and Schema Complexity</a>, which represents some thoughts I've
had on gaining a handle on the potentially complex nature of XML. It's
pretty heavy reading, though I admit that I backed away from going to
a more formal mathematical notation when I realized just how
intimidating it looked).

Keep in mind that in areas where you have complex documents (ones in
which there are a large number of potential states for a given
sequence of nodes in the document tree), normalization becomes much
more difficult to maintain. Similarly, the more that you normalize
content, the higher the amount of hierarchical restructuring becomes
necessary when transforming that content into other representations
(especially in terms of filters where you have multiple distinct flat
structures that may have deep referential relationships).

This is not to say that the rules that you're expressing aren't
important - in general, you are exactly correct in saying that the
best form that a schema can take (what I refer to as a canonical form
in the same paper)  is one where fields are effectively decoupled as
much as possible, and where relationships are explicit. In a simple
schema as you propose above, this canonicalization is obvious, but in
a high complexity schema (such as a DocBook document) such
canonicalization would be both expensive and largely irrelevent, as it
is precisely the relationships themselves - the container/contained
relationships in particular, that ARE the critical parts of the

-- Kurt Cagle
-- UnderstandingXML


Kurt Cagle


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS