xml-dev - Re: [xml-dev] A standard approach to glueing together reusable XML frag

Re: [xml-dev] A standard approach to glueing together reusable XML frag

[ Lists Home | Date Index | Thread Index ]

To: lbradshaw@dbex.com
Subject: Re: [xml-dev] A standard approach to glueing together reusable XML fragments in prose?
From: Mitch Amiano <mamiano@nc.rr.com>
Date: Tue, 19 Aug 2003 15:06:43 -0400
Cc: Sai Surya Kiran Evani <evani@informatik.uni-freiburg.de>, mitch.amiano@softwareadjuvant.com, xml-dev@lists.xml.org
In-reply-to: <5.1.1.5.2.20030819125723.030ebc78@mail.dbex.com>
Organization: Software Adjuvant
References: <3F410022.F6599E30@mitre.org> <3F410022.F6599E30@mitre.org> <5.1.1.5.2.20030819112123.00aa4b78@pop.earthlink.net> <5.1.1.5.2.20030819125723.030ebc78@mail.dbex.com>
Reply-to: mitch.amiano@softwareadjuvant.com
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030507

Ok, I'll just come out and say "that's stupid". 

"XML only" simply doesn't make sense. There is no such thing. 
It seems contrary to the whole point.
But maybe I'm in a minority.



lbradshaw@dbex.com wrote:

> 
> Ummm, the situation presented is that an XML only solution set can 
> provide the needed functionality, response time, and other capabilities.
> 
> No worries here that I can do this using a relational DBMS, normal 
> forms, etc. But doing it only using XML constructs, and tools specific 
> to XML, does not seem doable, reasonable, appropriate or supportable 
> over time.
> 
> Someone told me, verbally, and emphatically, that it is doable in XML 
> only constructs, to which I replied "Show me." and from whom no such 
> demonstration has been forthcoming. I think they said that because the 
> XML tech community, as a general rule, limits it discussions to 
> generalizations specific to small data stores, which is fine in and of 
> itself, but which leads to grave errors otherwise.
> 
> Also, I have to smile when someone says something like "implement XML 
> within a relational database".
> 
> Thanks for your response, and in my opinion you are exactly and 
> precisely correct. Delivering this functionality on this scale using the 
> currently available computer hardware and connectivity almost without 
> exception requires the use of normal forms, and normal forms are best 
> implemented in a dbms, not a document file.
> 
> {;^)
> 
> At 06:49 PM 8/19/2003 +0200, Sai Surya Kiran Evani wrote:
> 
>> Hi,
>>
>> Reg the ripple effects of updates of XML documents that you have 
>> mentioned would some kind of "normal forms" like in relational design 
>> be of help when designing the schemas for the XML documents. However, 
>> I do not know if such kind of normal forms exist for guiding XML 
>> schema designs.
>>
>> Regards,
>> Kiran.
>>
>> dbexcom wrote:
>>
>>> I am concerned to hear this approach, and others here, discussed, 
>>> without comment as to scaling issues regarding very large datastores 
>>> (in XML documents or in relational dbms) that might be ten to several 
>>> hundred terabytes in size.
>>>
>>> Specifically, in the following respects:
>>> 1- sheer size problems such as disk access time, out of memory 
>>> conditions, and processor time to parse very large XML documents 
>>> (say, 1,000 documents of 1 terabyte each) or a very large number of 
>>> XML documents of smaller size (say, 5,000,000 5MB docs).
>>> 2- maintenance issues driven by the smallest of interface changes or 
>>> presentation changes, that result in hundred of thousands if not 
>>> millions of manual static schema modifications, rippling across 
>>> either a very large number of smaller XML documents and their 
>>> specific schemas or through as many as a thousand or so documents of 
>>> 1 terabyte each in size. Even if such ripple effect maintenance can 
>>> be automated, the processing time required to update, say,  5,000,000 
>>> XML doc files of 5MB each cannot be said to be real time, so perhaps 
>>> weeks of processing time is required before the interface mods can be 
>>> subject to just one full test.
>>> 3- consistency across versions, releases, XML standards and tool sets 
>>> (MS, SQL Server, MySQL, Oracle, etc) considering that a very large 
>>> scale project will take some time to mature (possibly years), and 
>>> that a lack of backward compatibility could drive massive changes 
>>> into the basic XML design structure and overall document architecture.
>>> 4- transmission time across interchanges - whether lan, web or 
>>> intranet based, the time to transmit and parse result sets to XQuery 
>>> are often very large, and for very large XML documents this 
>>> processing time is unacceptably long. People want results in five to 
>>> eleven seconds, not minutes, not hours.
>>>
>>> I have specific experience in very large paper based, and relational 
>>> database systems. From time to time, I see folks scale up systems 
>>> that work fine, up to a point, past which they are forced to redesign 
>>> from scratch.
>>>
>>> While I agree that broadly generalized discussions are the most 
>>> common form of technical exchange of information, having seen several 
>>> of these pilot efforts crash and burn, I feel a moral obligation to 
>>> suggest that some comment be made as to scaling issues, known 
>>> propagation or ripple effects, and sheer size problems that come into 
>>> play when viable "average" architectures are scaled beyond their 
>>> design parameters.
>>>
>>> In reference to this specific method, I submit that when dealing with 
>>> a very large repository of prose, that a very large number of 
>>> "profile documents" is possible, and that the number of possible 
>>> "profile documents" correlates to some index of the context and the 
>>> subject matter and the usage purposes (inquiry / result pairs), a 
>>> result that to my mind increases or scales up as the number of prose 
>>> entities scales up. I will go further and say that, for instance, for 
>>> all articles ever published in the scientific journal "Nature", or 
>>> perhaps all items in the U.S. Library of Congress or all pending 
>>> applications and issued patent files in the U.S. Patent Office, this 
>>> number of possible "profile documents" becomes very large indeed. 
>>> Though it may be possible to satisfy as much as a majority if 
>>> inquiries with a small number of such structures, the rest of the 
>>> inquiries, it seems to me, will require an ever increasing number of 
>>> "profile documents" to satisfy so that satisfying the last 1 percent 
>>> of such inquiries might require several thousands of such "profile 
>>> documents", if not tens of thousands or hundreds of thousands.
>>>
>>> So, I am interested to hear about practical applications using XML 
>>> only implementation (XQuery, XML, XSLT, XPath, etc) that deal with 
>>> wide ranging subject matter, such as is found  in the scientific 
>>> journal "Nature", or perhaps all items in the U.S. Library of 
>>> Congress or all pending applications and issued patent files in the 
>>> U.S. Patent Office, to a very broad audience, across scientific 
>>> disciplines and cultures (and possibly languages), for a very large 
>>> data repository of mixed content (prose, graphics, slides, photos, 
>>> video, sound, other streaming data sources or media) measured in tens 
>>> or hundreds of terabytes.
>>>
>>> While XML is superb at document mark up, in my experience almost as 
>>> good as TeX, it does not strike me as the best tool for the job when 
>>> dealing with very large scale data repositories. Still, I have an 
>>> open mind and perhaps someone here can enlighten me.
>>>
>>> Thank you.
>>>
>>>
>>>
>>> At 10:28 PM 8/18/2003 -0400, you wrote:
>>>
>>>> One of the difficulties in considering factoring out functionally 
>>>> dependent entities from prose, is that the block of prose may itself 
>>>> not be worth reusing. That is, the prose may be a one-shot document 
>>>> whose original intent is simply to present information, not to act 
>>>> as a reliable container for access by clients with a variety of 
>>>> intents.
>>>> One thing I've done is to try to identify those concepts which are 
>>>> best understood, are most firmly established, and which serve as the 
>>>> focus of the stakeholders' activities and communications.  Then 
>>>> design a profile document for each of these high-level concepts, 
>>>> which provide context for making pointers and for generating 
>>>> identifiers. The profiles are designed to provide some elements 
>>>> which are rigidly structured, and other elements which are prose 
>>>> with mixed content. In one case at least, this allowed me (with a 
>>>> stylesheet) to resolve most cross references internal to the 
>>>> document itself, minimizing calls to scan external documents. Also, 
>>>> depending upon the nature of your data and your validation 
>>>> techniques, you may be able to use the mixed content prose as the 
>>>> source of the definitive information, rather than just as glue.
>>>> It is certainly something a good CMS can help with, but I've also 
>>>> used DSSSL and XSLT/XPath for doing just this sort of thing with 
>>>> reasonable results. You might also want to check out DITA by Michael 
>>>> Priestley et al. of IBM, which I think intends to facilitate topical 
>>>> reuse.
>>>>
>>>> Roger L. Costello wrote:
>>>>
>>>>> Hi Folks,
>>>>> I am working with some people who wish to migrate from an
>>>>> all-prose format to a prose-plus-reusable-XML-fragments
>>>>> format.
>>>>> They have some data in prose that is useable in many contexts.  They
>>>>> want to break out that reusable data  into XML fragments.  However,
>>>>> they want to continue to provide the prose style.
>>>>> For example, consider this prose data:
>>>>> <para>The city of Miami, Florida (pop. 1, 234,000) is a sprawling city
>>>>> with many attractions.  Miami Beach is a popular attraction.  The
>>>>> spring tide is ... The neap tide is ... </para>
>>>>> Examining this prose we can extract reusable info about the city of
>>>>> Miami:
>>>>> <City id="Miami">
>>>>>     <state>Florida</state>
>>>>>     <population>1,234,000</population>
>>>>> </City>
>>>>> We can also extract reusable info about tide data on Miami Beach:
>>>>> <TideData id="MiamiBeachTides">
>>>>>     <springTide>...</springTide>
>>>>>     <neapTide>...</neapTide>
>>>>> </TideData>
>>>>> The problem now is to create a framework which allows the prose
>>>>> to bring-together the independent, reusable XML components.
>>>>> Conceptually, what is desired is a "glue framework" like this:
>>>>> <para>The <ref href="Miami.xml"> is a sprawling city with
>>>>> many attractions.  Miami Beach is a popular attraction.  The
>>>>> tides are <ref href="MiamiBeachTides.xml"><para>
>>>>> Thus, the prose is "glueing" together the XML fragments.
>>>>> Is this a problem that you have experience with?  What  "glue
>>>>> framework" have you used?  What strategy did you use to merge
>>>>> the XML fragments with the prose?  Is there is a standard way
>>>>> of combining semi-structured data with structured data?
>>>>> /Roger
>>>>>
>>>>> -----------------------------------------------------------------
>>>>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>>>> initiative of OASIS <http://www.oasis-open.org>
>>>>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>>>> To subscribe or unsubscribe from this list use the subscription
>>>>> manager: <http://lists.xml.org/ob/adm.pl>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----------------------------------------------------------------
>>>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>>> initiative of OASIS <http://www.oasis-open.org>
>>>>
>>>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>>>
>>>> To subscribe or unsubscribe from this list use the subscription
>>>> manager: <http://lists.xml.org/ob/adm.pl>
>>>
>>>
>>>
>>>
>>>
>>> -----------------------------------------------------------------
>>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>> initiative of OASIS <http://www.oasis-open.org>
>>>
>>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>>
>>> To subscribe or unsubscribe from this list use the subscription
>>> manager: <http://lists.xml.org/ob/adm.pl>
>>
>>
>>
>>
>>
> 
> 
> 
> ************* NOTE: ************************
> 
> Copyright CDS, Inc, 2003. All rights withheld.
> 
> The information in this message is strictly confidential and may be
> legally privileged. It is intended solely for the addressee. Access to
> this message by any other person is prohibited. If you are not the
> intended recipient, any disclosure, copying, distribution or any action 
> taken or omitted
> to be taken in reliance on it, is prohibited and may be  unlawful.
> Please immediately contact the sender should this message have
> been incorrectly transmitted.
> 
> This message text and any attached files are Copyright CDS, Inc 2003, 
> and may not be
> reproduced, copied, distributed or released by any mechanical or 
> electronic means.
> 
> All rights are withheld.
> *********************************************************
> 
> 
>

References:
- A standard approach to glueing together reusable XML fragments in prose?
  - From: "Roger L. Costello" <costello@mitre.org>
- Re: [xml-dev] A standard approach to glueing together reusable XML fragments in prose?
  - From: dbexcom <lbradshaw@dbex.com>
- Re: [xml-dev] A standard approach to glueing together reusable XML fragments in prose?
  - From: lbradshaw@dbex.com

Prev by Date: Re: [xml-dev] A standard approach to glueing together reusable XML fragments in prose?
Next by Date: RE: [xml-dev] A standard approach to glueing together reusable XML fragments in prose?
Previous by thread: Re: [xml-dev] A standard approach to glueing together reusable XML fragments in prose?
Next by thread: Re: [xml-dev] A standard approach to glueing together reusable XML fragments in prose?
Index(es):
- Date
- Thread