[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [xml-dev] Caught napping!
Subrahmanyam Allamaraju wrote:
> > I disagree. We need a modelling theory, that allows us to focus on the logical modelling first. Then we need methods to map this logical model to some kind of physical storage. Be it a rdbms, xml-enabled rdbms, hierachical db, oodb or a 'native xml store'.
> > Without the logical modelling theory in place, we will never know whether the schema was sound and safe before we mapped it to physical storage.
> A very valid point. I don't see XML-DB proponents attempting to specify the logical data model, the role of schema in such a model, query and update semantics, what they mean, possible constraints (including integrity), types (such as numbers and dates) and so
> on. In the absence of such a logical framework, the problem seems ill-defined/under-specified to me.
Or . . ., we can advance the opposite view (as some of us do): that 'XML' is intended--unlike database theory--as an instance-first, not schema-first, nor model-first, nor ontology-first, nor any other semantic- nor concept-first approach to textual content.
That premise then implies to me that a natively XML database or data manipulation engine must be designed for content which is eXtensible, at will and in the instance, through (and only through) the application of Markup--not, that is, extensible by the prior
extension of a schema or the expansion of a model. The example I use to illustrate the consequence of this premise is that a natively XML database must be able to commit any well-formed XML instance--and commit it as an instance of the document type described by
its root element. It must then be prepared to commit *as a further instance of the same document type* any other well-formed XML document which exhibits the same root element, *regardless of the structure or content model exhibited below that root element, or
the degree to which that structure or content model differs from other documents committed as of the same root element type*. Why must a native XML database engine violate the most fundamental premises of data schema in this way? Because that is what the
*markup* of the *instance* commands, and this is XML, not eXtensible Schema Instantiation or eXpandable Model Realization.
And how does one process XML instances returned from such a database, when it is clear that instance-by-instance they are unlikely to conform to a single, or even a family of similar, schemata? One processes them in accord with one's particular processing needs
and purposes, employing one's own particular processing expertise. That is to say that the structure of each XML instance delivered to a process is of negligible importance compared to the data needs and structural expectations of that process. Those data needs
and structural expectations may be entirely unknown to the database engine delivering those XML instances or to the original authors of those XML documents. The only authority on the needs and expectations of a process is that process itself. This means that
only the process can reliably determine whether at the level of its content, not of its structure, a particular XML document can be manipulated by the resources of that process so as to result in the instantiation of the specific data which that process
And as a practical matter, how does a process make such determinations? The most useful tool here is the history of past data instantiations by that same process. The role that a native XML database can play to assist processes by managing that history is, it
seems to me, one of the most compelling arguments for a 'native' form of XML database engine. The transformation which must be accomplished is usually of the converse to what is expected by XSLT: instead of a given source structure to be transformed into
various targets, any of the range of structures which a document type might exhibit may need to be transformed into the specific data instantiation which a process expects. This reversal of the expectations of XSLT goes beyond the structure of the source
document, to the nature of the content which is to be selected, and to questions of how to identify and 'validate' it. Though that content was originally committed by the database engine based on the markup properties and content which it exhibited as a document
instance, it may need to selected for processing based on a process' very different understanding of the nature of that content. What could be less schema- or model-centric than this? This is not just a case of the database engine deriving an appropriate schema
to associate with each instance which it commits, since the processes which will retrieve that instance from the database a) will not know that derived schema; b) will understand the document that they call for on their own structural terms, which are likely
to be very different from the form in which they content they require was first committed; and c) will need to specify and find the content which they need in terms other than structural, and often in terms other than the datatypes which it apparently exhibited
in the instance in which it was originally committed.
In my experience, what this native XML database engine has to do is discover, and then manipulate, the intersection of the properties of data or content in an original instance with the properties of that data which are of use to a particular process. I believe
that this is true instance-first database function, and thus natively appropriate to 'XML' which is the textual content of marked-up document instances.