OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Data storage, data exchange, data manipulation



[not all snips denoted]

> -----Original Message-----
> From: Nicolas LEHUEN [mailto:nicolas.lehuen@ubicco.com]
...
> Well, this issue has been worrying me a lot for two years 
> now, so I'd like to share my thoughts on the subject...
> 
> A] Data storage using XML
> 

> 
> So I believe there is a whole set of problems that will 
> benefit from XML databases (which are I believe based on the 
> hierarchical database model*, maybe Mike can confirm/infirm). 
> The storage, indexation and querying of a set of 
> document-oriented data is a good example. 

I think object-relational databases have some promise. Knowing how to
decompose and XML hierarchy just enough to result in an efficient relation
model is more of an art than a science right now: I don't think you get much
benefit if all parent-child relations are rigorously broken down into
primary/FK pairs, for instance. Knowing how the data will be fetched is the
main design criteria of an object-relational model, with performance gains
for 'fixed' fetches coming at the cost of degrading some ad-hoc queries
(adding an XPath-based index for complex XML elements stored in columns
might speed up finding, but not fetching).

> 
> B] Data exchange using XML
> 
> Anyway, whatever the database model you chose, you'll have to 
> exchange data between your database and other systems (a 
> business application, another database, etc.). As everyone is 
> not using the same data model, you'll have to find a data 
> model for your data exchange that :
> 
> - can capture most semantics of your data => it has to have a 
> way to express basic structure.
> - be as simple as possible to allow for a wide audience => we 
> should look for the "largest common divisor" (from which you 
> can build any other models by adding things) rather than the 
> "least common multiple" (from which you can obtain any other 
> models by building subsets).
> - can easily be sent on a wire => the serialized form of the 
> structure has to be easily parseable and standardized.

Here I would mention validation, and the incessant proliferation of
duplicate constraints caused by the physical models being different while
the conceptual model remains the same. Each physical model tends to
replicate the same constraints defined by the conceptual model: the more
passing around of data, the more duplicate type-checking and
codepency-checking of values is done. This is a real problem, because:
	type systems between physical models differ
	some constriants are declarative, some operational (and they overlap
in functionality in places)
	some physical models fail to make a clear distinction between data
model and process model

I keep dreaming of the day (I may have to spec this myself; I'll call it the
Post Object-Oriented Process-Schema Infrastructure Initiative [POOPSII])
where all physical models can derive their constraints from the same
conceptual definition. It wouldn't avoid duplication of constraint
validation, but at least the constraints would come from the same common
model automatically (through code generation or inferrence), not hand-coded
to a spec doc or hand-mapping the conceptual data model to multiple physical
ones, with all the errors attendant.

> 
> Surprise, surprise, XML is AFAIK the right answer to these 
> needs. Things as simple as CSV files do not enable us to 
> capture enough semantics, and more complicated solution like 
> Java serialized objects or CORBA objects-by-values are 
> overkill. Here are some other arguments in favor of the 
> hierarchical model :

Add a schema, with type safety as well as structural constraints!

> 
> So, there is a third usage of XML, apart from data exchange 
> and persistent data storage : transitory, in-memory data 
> representation. Of course, it is more a matter of 
> representing data as a node-labeled tree than representing it 
> as serialized XML with tags and all, but the "XML spirit" is 
> there. This usage has a lot of advantages, at least for 
> front-end applications :

I think you're hinting at a looser coupling of process model and data model,
and it's a cantakerous issue (as you know). One argument you might try: data
models are cut-and-dried affairs (if you know how to normalize) and data
models stabilize over time. Process models share neither of these traits--
they are subjective and change frequently as business processes are
redefined and reorganized.

I think to a large degree you can separate data model development from
process model development; process modeling generally starts at a high level
of granularity, where a data model preexists in some form or other in any
business.

>
> - contrary to Java class definitions, XML schemas (or 
> schemata if you prefer :) are quite difficult to read and 
> write. The "difficult to read" issue can be solved by schema 
> documentation tools. XML Spy, for example, can generate a 
> pretty good documentation based on a W3C XML Schema (though 
> some current limitations prevent us for using this feature 
> efficiently). The "difficult to write" issue can be tackled 
> using tools, but unfortunately having a good editor is not 
> very helpful if the schema meta-model is inherently complex. 
> This is why we are looking for a simple, readable schema language.

One will emerge, but I do like XML Schema for what it tried to do. I think
it's an important milestone, but probably a waypoint to something better.

> 
> - compile-time checks are not performed. If you call 
> person.setFavoriteColour() on a Person instance, and the 
> Person class has not this method, you will get a compile-time 
> error. Using Java + DOM, a compiler cannot see an error when 
> you try to add the "favoriteColour" attribute or child 
> element to a "person" element. As we have developed a custom 
> XML language compiled to Java code, we feel that it is 
> possible to make the compiler schema aware, thus enabling 
> compile-time checks when the schema of the manipulated 
> documents are known.
> 

JAXB and Castor come to mind. JAXB will only be DTD compliant in the first
release, however :-(


> - "this is not pure object oriented programming !" I don't 
> know if it is the same out there, but here in France it 
> matters to a lot of people. My current 5 seconds answer is 
> that presentation layers usually do not require a pure object 
> oriented model for data. It does not mean that the underlying 
> framework of the layer is not object-oriented, far from it !

Who cares? Should methods be tightly bound to data? No, becuase 85% of
constraints can be stated declaratively... methods are overkill. The
remaining 15% are generally business rules, and may not apply to all
businesses. For those that do, methods are necessary, but there's fewer of
them in this scenario.

----------

I hope my blithering made some sense; I'll be happy to clarify, retract, or
backpedal what doesn't.

-- Jeff