OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] A standard approach to glueing together reusable XML frag

[ Lists Home | Date Index | Thread Index ]

Ok, I'll just come out and say "that's stupid". 

"XML only" simply doesn't make sense. There is no such thing. 
It seems contrary to the whole point.
But maybe I'm in a minority.



lbradshaw@dbex.com wrote:

> 
> Ummm, the situation presented is that an XML only solution set can 
> provide the needed functionality, response time, and other capabilities.
> 
> No worries here that I can do this using a relational DBMS, normal 
> forms, etc. But doing it only using XML constructs, and tools specific 
> to XML, does not seem doable, reasonable, appropriate or supportable 
> over time.
> 
> Someone told me, verbally, and emphatically, that it is doable in XML 
> only constructs, to which I replied "Show me." and from whom no such 
> demonstration has been forthcoming. I think they said that because the 
> XML tech community, as a general rule, limits it discussions to 
> generalizations specific to small data stores, which is fine in and of 
> itself, but which leads to grave errors otherwise.
> 
> Also, I have to smile when someone says something like "implement XML 
> within a relational database".
> 
> Thanks for your response, and in my opinion you are exactly and 
> precisely correct. Delivering this functionality on this scale using the 
> currently available computer hardware and connectivity almost without 
> exception requires the use of normal forms, and normal forms are best 
> implemented in a dbms, not a document file.
> 
> {;^)
> 
> At 06:49 PM 8/19/2003 +0200, Sai Surya Kiran Evani wrote:
> 
>> Hi,
>>
>> Reg the ripple effects of updates of XML documents that you have 
>> mentioned would some kind of "normal forms" like in relational design 
>> be of help when designing the schemas for the XML documents. However, 
>> I do not know if such kind of normal forms exist for guiding XML 
>> schema designs.
>>
>> Regards,
>> Kiran.
>>
>> dbexcom wrote:
>>
>>> I am concerned to hear this approach, and others here, discussed, 
>>> without comment as to scaling issues regarding very large datastores 
>>> (in XML documents or in relational dbms) that might be ten to several 
>>> hundred terabytes in size.
>>>
>>> Specifically, in the following respects:
>>> 1- sheer size problems such as disk access time, out of memory 
>>> conditions, and processor time to parse very large XML documents 
>>> (say, 1,000 documents of 1 terabyte each) or a very large number of 
>>> XML documents of smaller size (say, 5,000,000 5MB docs).
>>> 2- maintenance issues driven by the smallest of interface changes or 
>>> presentation changes, that result in hundred of thousands if not 
>>> millions of manual static schema modifications, rippling across 
>>> either a very large number of smaller XML documents and their 
>>> specific schemas or through as many as a thousand or so documents of 
>>> 1 terabyte each in size. Even if such ripple effect maintenance can 
>>> be automated, the processing time required to update, say,  5,000,000 
>>> XML doc files of 5MB each cannot be said to be real time, so perhaps 
>>> weeks of processing time is required before the interface mods can be 
>>> subject to just one full test.
>>> 3- consistency across versions, releases, XML standards and tool sets 
>>> (MS, SQL Server, MySQL, Oracle, etc) considering that a very large 
>>> scale project will take some time to mature (possibly years), and 
>>> that a lack of backward compatibility could drive massive changes 
>>> into the basic XML design structure and overall document architecture.
>>> 4- transmission time across interchanges - whether lan, web or 
>>> intranet based, the time to transmit and parse result sets to XQuery 
>>> are often very large, and for very large XML documents this 
>>> processing time is unacceptably long. People want results in five to 
>>> eleven seconds, not minutes, not hours.
>>>
>>> I have specific experience in very large paper based, and relational 
>>> database systems. From time to time, I see folks scale up systems 
>>> that work fine, up to a point, past which they are forced to redesign 
>>> from scratch.
>>>
>>> While I agree that broadly generalized discussions are the most 
>>> common form of technical exchange of information, having seen several 
>>> of these pilot efforts crash and burn, I feel a moral obligation to 
>>> suggest that some comment be made as to scaling issues, known 
>>> propagation or ripple effects, and sheer size problems that come into 
>>> play when viable "average" architectures are scaled beyond their 
>>> design parameters.
>>>
>>> In reference to this specific method, I submit that when dealing with 
>>> a very large repository of prose, that a very large number of 
>>> "profile documents" is possible, and that the number of possible 
>>> "profile documents" correlates to some index of the context and the 
>>> subject matter and the usage purposes (inquiry / result pairs), a 
>>> result that to my mind increases or scales up as the number of prose 
>>> entities scales up. I will go further and say that, for instance, for 
>>> all articles ever published in the scientific journal "Nature", or 
>>> perhaps all items in the U.S. Library of Congress or all pending 
>>> applications and issued patent files in the U.S. Patent Office, this 
>>> number of possible "profile documents" becomes very large indeed. 
>>> Though it may be possible to satisfy as much as a majority if 
>>> inquiries with a small number of such structures, the rest of the 
>>> inquiries, it seems to me, will require an ever increasing number of 
>>> "profile documents" to satisfy so that satisfying the last 1 percent 
>>> of such inquiries might require several thousands of such "profile 
>>> documents", if not tens of thousands or hundreds of thousands.
>>>
>>> So, I am interested to hear about practical applications using XML 
>>> only implementation (XQuery, XML, XSLT, XPath, etc) that deal with 
>>> wide ranging subject matter, such as is found  in the scientific 
>>> journal "Nature", or perhaps all items in the U.S. Library of 
>>> Congress or all pending applications and issued patent files in the 
>>> U.S. Patent Office, to a very broad audience, across scientific 
>>> disciplines and cultures (and possibly languages), for a very large 
>>> data repository of mixed content (prose, graphics, slides, photos, 
>>> video, sound, other streaming data sources or media) measured in tens 
>>> or hundreds of terabytes.
>>>
>>> While XML is superb at document mark up, in my experience almost as 
>>> good as TeX, it does not strike me as the best tool for the job when 
>>> dealing with very large scale data repositories. Still, I have an 
>>> open mind and perhaps someone here can enlighten me.
>>>
>>> Thank you.
>>>
>>>
>>>
>>> At 10:28 PM 8/18/2003 -0400, you wrote:
>>>
>>>> One of the difficulties in considering factoring out functionally 
>>>> dependent entities from prose, is that the block of prose may itself 
>>>> not be worth reusing. That is, the prose may be a one-shot document 
>>>> whose original intent is simply to present information, not to act 
>>>> as a reliable container for access by clients with a variety of 
>>>> intents.
>>>> One thing I've done is to try to identify those concepts which are 
>>>> best understood, are most firmly established, and which serve as the 
>>>> focus of the stakeholders' activities and communications.  Then 
>>>> design a profile document for each of these high-level concepts, 
>>>> which provide context for making pointers and for generating 
>>>> identifiers. The profiles are designed to provide some elements 
>>>> which are rigidly structured, and other elements which are prose 
>>>> with mixed content. In one case at least, this allowed me (with a 
>>>> stylesheet) to resolve most cross references internal to the 
>>>> document itself, minimizing calls to scan external documents. Also, 
>>>> depending upon the nature of your data and your validation 
>>>> techniques, you may be able to use the mixed content prose as the 
>>>> source of the definitive information, rather than just as glue.
>>>> It is certainly something a good CMS can help with, but I've also 
>>>> used DSSSL and XSLT/XPath for doing just this sort of thing with 
>>>> reasonable results. You might also want to check out DITA by Michael 
>>>> Priestley et al. of IBM, which I think intends to facilitate topical 
>>>> reuse.
>>>>
>>>> Roger L. Costello wrote:
>>>>
>>>>> Hi Folks,
>>>>> I am working with some people who wish to migrate from an
>>>>> all-prose format to a prose-plus-reusable-XML-fragments
>>>>> format.
>>>>> They have some data in prose that is useable in many contexts.  They
>>>>> want to break out that reusable data  into XML fragments.  However,
>>>>> they want to continue to provide the prose style.
>>>>> For example, consider this prose data:
>>>>> <para>The city of Miami, Florida (pop. 1, 234,000) is a sprawling city
>>>>> with many attractions.  Miami Beach is a popular attraction.  The
>>>>> spring tide is ... The neap tide is ... </para>
>>>>> Examining this prose we can extract reusable info about the city of
>>>>> Miami:
>>>>> <City id="Miami">
>>>>>     <state>Florida</state>
>>>>>     <population>1,234,000</population>
>>>>> </City>
>>>>> We can also extract reusable info about tide data on Miami Beach:
>>>>> <TideData id="MiamiBeachTides">
>>>>>     <springTide>...</springTide>
>>>>>     <neapTide>...</neapTide>
>>>>> </TideData>
>>>>> The problem now is to create a framework which allows the prose
>>>>> to bring-together the independent, reusable XML components.
>>>>> Conceptually, what is desired is a "glue framework" like this:
>>>>> <para>The <ref href="Miami.xml"> is a sprawling city with
>>>>> many attractions.  Miami Beach is a popular attraction.  The
>>>>> tides are <ref href="MiamiBeachTides.xml"><para>
>>>>> Thus, the prose is "glueing" together the XML fragments.
>>>>> Is this a problem that you have experience with?  What  "glue
>>>>> framework" have you used?  What strategy did you use to merge
>>>>> the XML fragments with the prose?  Is there is a standard way
>>>>> of combining semi-structured data with structured data?
>>>>> /Roger
>>>>>
>>>>> -----------------------------------------------------------------
>>>>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>>>> initiative of OASIS <http://www.oasis-open.org>
>>>>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>>>> To subscribe or unsubscribe from this list use the subscription
>>>>> manager: <http://lists.xml.org/ob/adm.pl>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----------------------------------------------------------------
>>>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>>> initiative of OASIS <http://www.oasis-open.org>
>>>>
>>>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>>>
>>>> To subscribe or unsubscribe from this list use the subscription
>>>> manager: <http://lists.xml.org/ob/adm.pl>
>>>
>>>
>>>
>>>
>>>
>>> -----------------------------------------------------------------
>>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>> initiative of OASIS <http://www.oasis-open.org>
>>>
>>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>>
>>> To subscribe or unsubscribe from this list use the subscription
>>> manager: <http://lists.xml.org/ob/adm.pl>
>>
>>
>>
>>
>>
> 
> 
> 
> ************* NOTE: ************************
> 
> Copyright CDS, Inc, 2003. All rights withheld.
> 
> The information in this message is strictly confidential and may be
> legally privileged. It is intended solely for the addressee. Access to
> this message by any other person is prohibited. If you are not the
> intended recipient, any disclosure, copying, distribution or any action 
> taken or omitted
> to be taken in reliance on it, is prohibited and may be  unlawful.
> Please immediately contact the sender should this message have
> been incorrectly transmitted.
> 
> This message text and any attached files are Copyright CDS, Inc 2003, 
> and may not be
> reproduced, copied, distributed or released by any mechanical or 
> electronic means.
> 
> All rights are withheld.
> *********************************************************
> 
> 
> 






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS