Lists Home |
Date Index |
You have identified one area of concern, or set of factors / issues, that
concerns me regarding use of XML for data systems, and I agree with your
view as to how difficult such are to implement, maintain and support.
But it seems to me that your conclusion that "...but I have to assume that
(pulling numbers out of the air) a 3-way Join of hierarchical document
collections will be more practical than 100-way joins across normalized
relations containing the components of complex documents such as aircraft
maintenance manuals...." causes me some concerns. Specifically:
- it works the other way, IE a 3 way outer join on normalized data is more
effective than 1 join for every element in every hierarchical document,
where you might have several elements in hundreds or thousands of document
- assuming the 3 way join on XML docs is the same question as the 100-way
join (and how can that be the case if the relational data is well designed,
or could be mapped to the XML docs?), the 100-way join can use optimization
facilities existant in database products such as Oracle that do not exist
for XML docs
- components of complex documents exhibit increasing complexity over time,
IE it is not a static system, but rather is a dynamic system. So while the
100 way join will always be a 100 way join, the 3 way join is highly likely
to become a 300,000 way join over time, or an exponential growth in
complexity over time for the non-normalized non-relational forms.
The fatal assumption seems to me to be inherent in the perception of a
document as a printed page, a static physical object that does not change.
Once it is automated, as a relational data system or an XML document, this
assumption no longer holds true. Notations are added, links are added to
other documents, external references link into, or through, specific areas
or context references in the document and so on and so on.
While XML, as a child entitiy of SGML, might be well suited to static
document markups, I just cannot see how it is well suited to dynamic
Next, I expect to hear folks say that it is not _meant_ or _intended_ to be
well suited to dynamic document automation, to which my reply is Oh
Contraire..... if you automate with XML that is precisely the premise you
are utilizing.... that XML is a best practice approach to dynamic document
Unlike a printed page, an automated document, like any other automated data
system, is dynamic and subject to change driven by external requirements
that are by definition in flux. Assuming that a static state anywhere in
the automated document process is acceptable is not valid IMHO.
Sure, you might be able to make it work today. Or even tomorrow. But
working for 20 years, or longer, is not likely to be viable because the
maintenance and additional work requirements are likely to change in as yet
unknown ways, driving costs that can be shown to be at least linear and
more likely exponentially increasing over time.
That kind of outcome is precisely what TQM and then PE (process
engineering) and now ISO 9000 and CMM have tried to avoid.
That kind of outcome is not uncommon among software or automation projects,
historically, and, sadly, at present.
That kind of outcome, a chaotic result, is typical of development processes
that do not employ scientific methods, or use proofs and hard tests where
results are measurable, reproducible, and predictable.
Now, of course the exception occurs now and then, someone will reach into a
haystack of needles and pluck out precisely the needle needed, but that is
always within a limited scope, or known universe, and is much more likely
when the requirements are less rigorous and the lifecycle is shorter than
So, ok, that's my take on it. Problems arise most often from the
assumptions we do not realize we are making, or have not examined in proper
course. Ergo the gauzy or foggy feeling one gets from CMM, the point is to
identify problems before they are problems, and cure them long before they
exhibit negative effects.
Thanks for your response.
At 08:30 AM 8/20/2003 -0700, Mike Champion wrote:
>--- Rick Marshall <email@example.com> wrote:
> > <customer>
> > <name>COMPANY X</name>
> > <town>SOMEWHERE</town>
> > <order>
> > <part>ABC123</part>
> > <quantity>2</quantity>
> > </order>
> > <order>
> > <part>ABC234</part>
> > <quantity>4</quantity>
> > </order>
> > </customer>
> > just isn't going to be a relational form as there's
> > no way to determine
> > a priori what the normalised records are....
> > so without some semantics you can't represent
> > relational tables with the
> > natural tree structure of xml.
>Yup. The hierarchical approach that XML supports
>allows you to not worry about the sometimes
>challenging problem of figuring out what the keys
>would be in a normalization that will allow you to get
>back the information you put in. It's sortof like the
>fox and hedgehog: the relational model has a many
>tricks for defining relationships among components,
>but you have to be clever to use it well; XML has only
>one trick ("containment") but it's a pretty powerful
>one. Of course, not all data fit the "natural tree
>structure of XML" but a lot of interesting examples
>The downside, which I think is the point of this
>thread (I haven't read the whole thing!) is that XML's
>"one big trick" works best if the document as a whole
>is the unit of analysis and storage. Once you start
>composing compound documents out of individual
>entities or need to update specific
>elements/attributes inside an entity, things start to
>get very ugly and there's little in the way of a
>theoretical model such as Codd developed to guide you.
>For example, there is a more or less irresolveable
>muddle between the XML syntax level model of entity
>declarations and references and the
>Infoset/XPath/XQuery model in which these are assumed
>to have been resolved. (DOM tries to play on both
>sides of the street, but that part of its conceptual
>model is very ugly).
>XQuery is probably a great breakthrough here by
>allowing both the implicit containment relationships
>that the relational model lacks and allowing documents
>to be composed by a Join operation on shared values,
>which AFAIK is the most profoundly powerful aspect of
>the RM. Whether XQuery implementations can be written
>in a way so as to make this practical for
>terabyte-scale databases is yet to be seen ... but I
>have to assume that (pulling numbers out of the air) a
>3-way Join of hierarchical document collections will
>be more practical than 100-way joins across normalized
>relations containing the components of complex
>documents such as aircraft maintenance manuals.
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To subscribe or unsubscribe from this list use the subscription