OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   XML CMM ISO9000 compliance? - was A standard approach to glueing togethe

[ Lists Home | Date Index | Thread Index ]

You have identified one area of concern, or set of factors / issues, that 
concerns me regarding use of XML for data systems, and I agree with your 
view as to how difficult such are to implement, maintain and support.

But it seems to me that your conclusion that "...but I have to assume that 
(pulling numbers out of the air) a 3-way Join of hierarchical document 
collections will be more practical than 100-way joins across normalized 
relations containing the components of complex documents such as aircraft 
maintenance manuals...." causes me some concerns. Specifically:
- it works the other way, IE a 3 way outer join on normalized data is more 
effective than 1 join for every element in every hierarchical document, 
where you might have several elements in hundreds or thousands of document 
- assuming the 3 way join on XML docs is the same question as the 100-way 
join (and how can that be the case if the relational data is well designed, 
or could be mapped to the XML docs?), the 100-way join can use optimization 
facilities existant in database products such as Oracle that do not exist 
for XML docs
- components of complex documents exhibit increasing complexity over time, 
IE it is not a static system, but rather is a dynamic system. So while the 
100 way join will always be a 100 way join, the 3 way join is highly likely 
to become a 300,000 way join over time, or an exponential growth in 
complexity over time for the non-normalized non-relational forms.

The fatal assumption seems to me to be inherent in the perception of a 
document as a printed page, a static physical object that does not change. 
Once it is automated, as a relational data system or an XML document, this 
assumption no longer holds true. Notations are added, links are added to 
other documents, external references link into, or through, specific areas 
or context references in the document and so on and so on.

While XML, as a child entitiy of SGML, might be well suited to static 
document markups, I just cannot see how it is well suited to dynamic 
document automation.

Next, I expect to hear folks say that it is not _meant_ or _intended_ to be 
well suited to dynamic document automation, to which my reply is Oh 
Contraire..... if you automate with XML that is precisely the premise you 
are utilizing.... that XML is a best practice approach to dynamic document 

Unlike a printed page, an automated document, like any other automated data 
system, is dynamic and subject to change driven by external requirements 
that are by definition in flux. Assuming that a static state anywhere in 
the automated document process is acceptable is not valid IMHO.

Sure, you might be able to make it work today. Or even tomorrow. But 
working for 20 years, or longer, is not likely to be viable because the 
maintenance and additional work requirements are likely to change in as yet 
unknown ways, driving costs that can be shown to be at least linear and 
more likely exponentially increasing over time.

That kind of outcome is precisely what TQM and then PE (process 
engineering) and now ISO 9000 and CMM have tried to avoid.

That kind of outcome is not uncommon among software or automation projects, 
historically, and, sadly, at present.

That kind of outcome, a chaotic result, is typical of development processes 
that do not employ scientific methods, or use proofs and hard tests where 
results are measurable, reproducible, and predictable.

Now, of course the exception occurs now and then, someone will reach into a 
haystack of needles and pluck out precisely the needle needed, but that is 
always within a limited scope, or known universe, and is much more likely 
when the requirements are less rigorous and the lifecycle is shorter than 
the norm.

So, ok, that's my take on it. Problems arise most often from the 
assumptions we do not realize we are making, or have not examined in proper 
course. Ergo the gauzy or foggy feeling one gets from CMM, the point is to 
identify problems before they are problems, and cure them long before they 
exhibit negative effects.

Thanks for your response.

At 08:30 AM 8/20/2003 -0700, Mike Champion wrote:
>--- Rick Marshall <rjm@zenucom.com> wrote:
> > <customer>
> >       <name>COMPANY X</name>
> >       <town>SOMEWHERE</town>
> >       <order>
> >               <part>ABC123</part>
> >               <quantity>2</quantity>
> >       </order>
> >       <order>
> >               <part>ABC234</part>
> >               <quantity>4</quantity>
> >       </order>
> > </customer>
> >
> > just isn't going to be a relational form as there's
> > no way to determine
> > a priori what the normalised records are....
> > so without some semantics you can't represent
> > relational tables with the
> > natural tree structure of xml.
>Yup.  The hierarchical approach that XML supports
>allows you to not worry about the sometimes
>challenging problem of figuring out what the keys
>would be in a normalization that will allow you to get
>back the information you put in.  It's sortof like the
>fox and hedgehog: the relational model has a many
>tricks for defining relationships among components,
>but you have to be clever to use it well; XML has only
>one trick ("containment") but it's a pretty powerful
>one.  Of course, not all data fit the "natural tree
>structure of XML" but a lot of interesting examples
>The downside, which I think is the point of this
>thread (I haven't read the whole thing!) is that XML's
>"one big trick" works best if the document as a whole
>is the unit of analysis and storage.  Once you start
>composing compound documents out of individual
>entities or need to update specific
>elements/attributes inside an entity, things start to
>get very ugly and there's little in the way of a
>theoretical model such as Codd developed to guide you.
>For example, there is a more or less irresolveable
>muddle between the XML syntax level model of entity
>declarations and references and the
>Infoset/XPath/XQuery model in which these are assumed
>to have been resolved.  (DOM tries to play on both
>sides of the street, but that part of its conceptual
>model is very ugly).
>XQuery is probably a great breakthrough here by
>allowing both the implicit containment relationships
>that the relational model lacks and allowing documents
>to be composed by a Join operation on shared values,
>which AFAIK is the most profoundly powerful aspect of
>the RM.  Whether XQuery implementations can be written
>in a way so as to make this practical for
>terabyte-scale databases is yet to be seen ... but I
>have to assume that (pulling numbers out of the air) a
>3-way Join of hierarchical document collections will
>be more practical than 100-way joins across normalized
>relations containing the components of complex
>documents such as aircraft maintenance manuals.
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS