OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] A standard approach to glueing together reusable XML fragm

[ Lists Home | Date Index | Thread Index ]

At 05:44 PM 8/20/2003 +1000, you wrote:

>On Tue, Aug 19, 2003 at 04:48:08PM -0400, dbexlist wrote:
> > I like what I see in TeraText, from their web site, but none of the
> > situations of which I am aware can afford to treat the data elements, or
> > XML data items, as text only. Every one of these applications has cause to
> > use relations between normal forms of the data elements, and to do 
> advanced
> > indexing on various data types not just text, such as dates and date
> > ranges, numerical process results (averages, means, distributions, etc),
> > scientific enumerations and so on.
>Just to clarify, as one of the TeraText developers I should note that the
>TeraText DBS can store and index data not just as SGML or XML or MARC data,
>but also as both primitive types such as dates, durations, integers, floats,
>booleans, and Unicode/ASCII strings.  These can be repeating, combined in
>user-definable, recursive structures, or can used to populate dynamically
>calculated fields.  So it's not just raw XML. :-)

news to me, but good to hear.

> > Gov't docs are often like that - they are heavily laden with text or 
> prose,
> > but also have significant valuations in other data types including math
> > equations with all sorts of notation formats or other readings such as
> > pollution indexes from the EPA, or farm crop estimates vs. harvests by 
> crop
> > by month by county by year, or rainfall vs. temperature over time for each
> > day by gps coordinate areas, etc. etc.
>Yes, absolutely.  It's really common for applications to want to directly
>store lists of keywords, dates, durations, etc. in a record, along with
>well-formed or valid XML.
> > In other words, the TeraText approach does not seem to support relations
> > between normal forms, and so seems to have a self imposed design limit 
> that
> > I, personally, find short of desirable. It is not just about massive data
> > handling, but also about being able to do things with that data after it
> > has been captured and has existed for some time, things that support
> > requirements that are not yet known. In my opinion. Only normal forms and
> > relational theory or the relational model (RM) offer this capability, 
> in my
> > opinion.
>Yes; building chains from one piece of information to another can
>be invaluable, particularly with intelligence problems.  To that end,
>the TeraText DBS has the ability to index specific relationships between
>records in different databases; a bit like pre-computed joins.
>For particular kinds of applications, this is often precisely what's
>needed.  True, it's not the same as having a relational database, but
>if one has several 100GB of genuinely relational data one can always
>attempt to manage it with [a leading RDBMS]. :-)

The situation presented to me was that a high growth (10% / yr or more) 
very large datastore (terabytes of prose plus terabytes of data, plus 
streaming media) data store is _best_ implemented in pure XML or an XML 
only struture, even though the processes using this data require relations 
on normal forms, self-joins, inner-joins, outer-joins, full corpus searches 
of some complexity and versioning of documents. My response was that, maybe 
it could be done, but XML only was not the best way to quickly achieve low 
cost (both initial and maintenance / operational) and high reliability and 
high flexibility in off the shelf hardware (sun servers at most). It does 
not seem to me that this size and scope of data can be managed in anything 
other than [a leading RDBMS], though perhaps it can be built in TeraText or 
another similar product line.

The key word here being "managed". Massive data stores like this take on a 
life of their own in my experience, gain their own momentum and dynamics 
with an ever increasing list of dependent systems or processes. This makes 
them difficult to manage. I just don't see the tool set in TeraText that I 
see in, say, Oracle.

For the sake of discussion I am willing to stipulate that TeraText, or [a 
leading XML only vendor] can do everything Oracle can do, though my 
experience is that this is emphatically _not_ the case.. There are still 
serious concerns with an XML only approach. Specifically, my gut feeling is 
that a pure XML approach has a significant risk, or a certainty, of 
n-modifications being driven by y-permutations of z changes across static 
schemas and into XML docs (whether record oriented or data oriented). 
Meaning it seems to me that XML maintenance work will grow exponentially 
over time, while [a leading RDBMS] maintenance work remains linear or less 
than linear with respect to the baseline level of effort.

It worries me to see PTO and other efforts proceeding without apparent 
consideration to the specific, well documented, and very difficult to 
resolve issues that drove the development of Relational Theory and the 
Relational Model (RM), way back when.... I agree with the position taken by 
others that if SQL adhered to and fully supported RM that SQL maintenance 
issues would be exponentially less than they are currently and have the 
same sentiments towards XML .... IE  if it fully supports RM then we can 
reasonably expect lower maintenance and support costs over time, if it does 
not support RM then we can reasonably expect escalating maintenance and 
support costs over time. Exponentially escalating costs are highly 
undesirable in my opinion.

Unless someone can show me how XML or an XML only tool set such as TeraText 
supports and fulfills RM, my expectations regarding exponentially 
increasing maintenance work efforts will remain a serious concern for me. 
The issues that drove the development of RM have not gone away, and are 
very apparent to me in many, or all, of the XML discussions I read - though 
different language is used.

One does not have to look far to see a plethora of examples, in business or 
the public sector, of high maintenance costs associated with 
state-of-the-art XML systems. Lots of data exists from published sources to 
support the concern that high maintenance costs are escalating at a 
non-linear rate for the vast majority of XML systems even though most of 
these systems are not XML only solutions.

Theory and practice always differ, but I would like to see proofs that high 
maintenance costs, escalating over time, is not the normal evolutionary 
path for almost all XML systems.

It is not the normal practice for budgets to allocate funds exceeding the 
original application cost, year after year, escalating over time, for 
maintenance work on existing applications, in my opinion. Nor is this the 
result expected by senior or high level management.

In practice, in real world practical applications, well designed dbms 
systems that approach RM require at most 1/100th of their original 
development costs in maintenance expenditures on an annual basis. If an XML 
approach cannot offer a better result at a lower cost over the lifetime of 
the application, then I submit that the only ethically and morally valid 
approach (that is to say the only Professional approach) in the context of 
private sector economics or public sector economics is [ a leading RDBMS 
vendor ] product.



>Multimedia Databases Group, RMIT, Australia.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS