Lists Home |
Date Index |
Ok, I'll just come out and say "that's stupid".
"XML only" simply doesn't make sense. There is no such thing.
It seems contrary to the whole point.
But maybe I'm in a minority.
> Ummm, the situation presented is that an XML only solution set can
> provide the needed functionality, response time, and other capabilities.
> No worries here that I can do this using a relational DBMS, normal
> forms, etc. But doing it only using XML constructs, and tools specific
> to XML, does not seem doable, reasonable, appropriate or supportable
> over time.
> Someone told me, verbally, and emphatically, that it is doable in XML
> only constructs, to which I replied "Show me." and from whom no such
> demonstration has been forthcoming. I think they said that because the
> XML tech community, as a general rule, limits it discussions to
> generalizations specific to small data stores, which is fine in and of
> itself, but which leads to grave errors otherwise.
> Also, I have to smile when someone says something like "implement XML
> within a relational database".
> Thanks for your response, and in my opinion you are exactly and
> precisely correct. Delivering this functionality on this scale using the
> currently available computer hardware and connectivity almost without
> exception requires the use of normal forms, and normal forms are best
> implemented in a dbms, not a document file.
> At 06:49 PM 8/19/2003 +0200, Sai Surya Kiran Evani wrote:
>> Reg the ripple effects of updates of XML documents that you have
>> mentioned would some kind of "normal forms" like in relational design
>> be of help when designing the schemas for the XML documents. However,
>> I do not know if such kind of normal forms exist for guiding XML
>> schema designs.
>> dbexcom wrote:
>>> I am concerned to hear this approach, and others here, discussed,
>>> without comment as to scaling issues regarding very large datastores
>>> (in XML documents or in relational dbms) that might be ten to several
>>> hundred terabytes in size.
>>> Specifically, in the following respects:
>>> 1- sheer size problems such as disk access time, out of memory
>>> conditions, and processor time to parse very large XML documents
>>> (say, 1,000 documents of 1 terabyte each) or a very large number of
>>> XML documents of smaller size (say, 5,000,000 5MB docs).
>>> 2- maintenance issues driven by the smallest of interface changes or
>>> presentation changes, that result in hundred of thousands if not
>>> millions of manual static schema modifications, rippling across
>>> either a very large number of smaller XML documents and their
>>> specific schemas or through as many as a thousand or so documents of
>>> 1 terabyte each in size. Even if such ripple effect maintenance can
>>> be automated, the processing time required to update, say, 5,000,000
>>> XML doc files of 5MB each cannot be said to be real time, so perhaps
>>> weeks of processing time is required before the interface mods can be
>>> subject to just one full test.
>>> 3- consistency across versions, releases, XML standards and tool sets
>>> (MS, SQL Server, MySQL, Oracle, etc) considering that a very large
>>> scale project will take some time to mature (possibly years), and
>>> that a lack of backward compatibility could drive massive changes
>>> into the basic XML design structure and overall document architecture.
>>> 4- transmission time across interchanges - whether lan, web or
>>> intranet based, the time to transmit and parse result sets to XQuery
>>> are often very large, and for very large XML documents this
>>> processing time is unacceptably long. People want results in five to
>>> eleven seconds, not minutes, not hours.
>>> I have specific experience in very large paper based, and relational
>>> database systems. From time to time, I see folks scale up systems
>>> that work fine, up to a point, past which they are forced to redesign
>>> from scratch.
>>> While I agree that broadly generalized discussions are the most
>>> common form of technical exchange of information, having seen several
>>> of these pilot efforts crash and burn, I feel a moral obligation to
>>> suggest that some comment be made as to scaling issues, known
>>> propagation or ripple effects, and sheer size problems that come into
>>> play when viable "average" architectures are scaled beyond their
>>> design parameters.
>>> In reference to this specific method, I submit that when dealing with
>>> a very large repository of prose, that a very large number of
>>> "profile documents" is possible, and that the number of possible
>>> "profile documents" correlates to some index of the context and the
>>> subject matter and the usage purposes (inquiry / result pairs), a
>>> result that to my mind increases or scales up as the number of prose
>>> entities scales up. I will go further and say that, for instance, for
>>> all articles ever published in the scientific journal "Nature", or
>>> perhaps all items in the U.S. Library of Congress or all pending
>>> applications and issued patent files in the U.S. Patent Office, this
>>> number of possible "profile documents" becomes very large indeed.
>>> Though it may be possible to satisfy as much as a majority if
>>> inquiries with a small number of such structures, the rest of the
>>> inquiries, it seems to me, will require an ever increasing number of
>>> "profile documents" to satisfy so that satisfying the last 1 percent
>>> of such inquiries might require several thousands of such "profile
>>> documents", if not tens of thousands or hundreds of thousands.
>>> So, I am interested to hear about practical applications using XML
>>> only implementation (XQuery, XML, XSLT, XPath, etc) that deal with
>>> wide ranging subject matter, such as is found in the scientific
>>> journal "Nature", or perhaps all items in the U.S. Library of
>>> Congress or all pending applications and issued patent files in the
>>> U.S. Patent Office, to a very broad audience, across scientific
>>> disciplines and cultures (and possibly languages), for a very large
>>> data repository of mixed content (prose, graphics, slides, photos,
>>> video, sound, other streaming data sources or media) measured in tens
>>> or hundreds of terabytes.
>>> While XML is superb at document mark up, in my experience almost as
>>> good as TeX, it does not strike me as the best tool for the job when
>>> dealing with very large scale data repositories. Still, I have an
>>> open mind and perhaps someone here can enlighten me.
>>> Thank you.
>>> At 10:28 PM 8/18/2003 -0400, you wrote:
>>>> One of the difficulties in considering factoring out functionally
>>>> dependent entities from prose, is that the block of prose may itself
>>>> not be worth reusing. That is, the prose may be a one-shot document
>>>> whose original intent is simply to present information, not to act
>>>> as a reliable container for access by clients with a variety of
>>>> One thing I've done is to try to identify those concepts which are
>>>> best understood, are most firmly established, and which serve as the
>>>> focus of the stakeholders' activities and communications. Then
>>>> design a profile document for each of these high-level concepts,
>>>> which provide context for making pointers and for generating
>>>> identifiers. The profiles are designed to provide some elements
>>>> which are rigidly structured, and other elements which are prose
>>>> with mixed content. In one case at least, this allowed me (with a
>>>> stylesheet) to resolve most cross references internal to the
>>>> document itself, minimizing calls to scan external documents. Also,
>>>> depending upon the nature of your data and your validation
>>>> techniques, you may be able to use the mixed content prose as the
>>>> source of the definitive information, rather than just as glue.
>>>> It is certainly something a good CMS can help with, but I've also
>>>> used DSSSL and XSLT/XPath for doing just this sort of thing with
>>>> reasonable results. You might also want to check out DITA by Michael
>>>> Priestley et al. of IBM, which I think intends to facilitate topical
>>>> Roger L. Costello wrote:
>>>>> Hi Folks,
>>>>> I am working with some people who wish to migrate from an
>>>>> all-prose format to a prose-plus-reusable-XML-fragments
>>>>> They have some data in prose that is useable in many contexts. They
>>>>> want to break out that reusable data into XML fragments. However,
>>>>> they want to continue to provide the prose style.
>>>>> For example, consider this prose data:
>>>>> <para>The city of Miami, Florida (pop. 1, 234,000) is a sprawling city
>>>>> with many attractions. Miami Beach is a popular attraction. The
>>>>> spring tide is ... The neap tide is ... </para>
>>>>> Examining this prose we can extract reusable info about the city of
>>>>> <City id="Miami">
>>>>> We can also extract reusable info about tide data on Miami Beach:
>>>>> <TideData id="MiamiBeachTides">
>>>>> The problem now is to create a framework which allows the prose
>>>>> to bring-together the independent, reusable XML components.
>>>>> Conceptually, what is desired is a "glue framework" like this:
>>>>> <para>The <ref href="Miami.xml"> is a sprawling city with
>>>>> many attractions. Miami Beach is a popular attraction. The
>>>>> tides are <ref href="MiamiBeachTides.xml"><para>
>>>>> Thus, the prose is "glueing" together the XML fragments.
>>>>> Is this a problem that you have experience with? What "glue
>>>>> framework" have you used? What strategy did you use to merge
>>>>> the XML fragments with the prose? Is there is a standard way
>>>>> of combining semi-structured data with structured data?
>>>>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>>>> initiative of OASIS <http://www.oasis-open.org>
>>>>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>>>> To subscribe or unsubscribe from this list use the subscription
>>>>> manager: <http://lists.xml.org/ob/adm.pl>
>>>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>>> initiative of OASIS <http://www.oasis-open.org>
>>>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>>> To subscribe or unsubscribe from this list use the subscription
>>>> manager: <http://lists.xml.org/ob/adm.pl>
>>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>> initiative of OASIS <http://www.oasis-open.org>
>>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>> To subscribe or unsubscribe from this list use the subscription
>>> manager: <http://lists.xml.org/ob/adm.pl>
> ************* NOTE: ************************
> Copyright CDS, Inc, 2003. All rights withheld.
> The information in this message is strictly confidential and may be
> legally privileged. It is intended solely for the addressee. Access to
> this message by any other person is prohibited. If you are not the
> intended recipient, any disclosure, copying, distribution or any action
> taken or omitted
> to be taken in reliance on it, is prohibited and may be unlawful.
> Please immediately contact the sender should this message have
> been incorrectly transmitted.
> This message text and any attached files are Copyright CDS, Inc 2003,
> and may not be
> reproduced, copied, distributed or released by any mechanical or
> electronic means.
> All rights are withheld.