[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Data storage, data exchange, data manipulation

From: Nicolas LEHUEN <nicolas.lehuen@ubicco.com>
To: 'Jonathan Borden' <jborden@mediaone.net>
Date: Fri, 29 Jun 2001 18:33:48 +0200
 

-----Message d'origine-----
De : Jonathan Borden [mailto:jborden@mediaone.net]
Envoye : vendredi 29 juin 2001 15:32
A : Nicolas LEHUEN
Cc : xml-dev@lists.xml.org
Objet : RE: Data storage, data exchange, data manipulation 



Nicolas LEHUEN wrote: 

> 
> I don't think a node-labeled tree (the XML model is a tree, more 
> restricted than a graph) structure can model all kind of data 
> easily and efficiently. 

There is a wealth of discussion on representing graphs in XML. Techniques 
include ID/IDREF, XLink/XPointer, RDF, TM etc, etc. The easy and efficient 
part depends on the implementation. 

Agreed, you can represent graphs in XML using tricks. That's perfectly what
i meant by 'easy and efficient' : to store a graph, it's better to use a
database that can store graphs natively than to use  tricks and store it in
an XML or relational database. I *do* worry about the implementation,
because that's what I need, not mathematical proofs of data structures
equivalence.

 > 
> So I believe there is a whole set of problems that will benefit 
> from XML databases (which are I believe based on the hierarchical 
> database model*, maybe Mike can confirm/infirm). The storage, 
> indexation and querying of a set of document-oriented data is a 
> good example. 

Similarly there is a lot of work on XML 'enabling' of relational databases.


Right, and some part of this work tackles with how to represent a
hierarchical data structure on a relational database (whereas a hierarchical
DB does not have the problem). Another part of this work is how to exchange
data between relational databases storing relational data and other systems
using XML, and that is much more interesting.

> 
> But XML databases isn't or (won't) be a revolution, blasting all 
> other storage models. We could even say that the XML database 
> model is just a come back of the hierarchical model that was 
> supposedly "killed" by the relational model back in the 80s. I 
> don't think XML databases are the "next thing". 

If we forget hype and terminology, there *is* real work being done on the 
XML Query Model and XML Query languages. One needs to distinguish between a 
query model and an internal representation. 

Does implementing a XML view of the inner data, and implementing XML Query
to build/manipulate this view, makes a database an XML database ? I think
it's possible to do so whatever the underlying data model is. The internal
data model can be much richer than a hierarchical one, while still retaining
a hierarchical view as its external interface. This way, you could use a
consistent XML interface to query any kind of data, be it hierarchical or
not, and yes, the works on XML Query would be of highest interest.

However, I don't think we can reduce all data structures down to the
hierarchical model. Again, I'm not telling the hierarchical model is bad. I
think it perfectly suits a class of problems, and doesn't suit the rest,
like all data models. History has proven this.

I agree with you, let's forget the assumption XML Query => XML storage. XML
and XML Query is an opportunity to get a consistent system to build views on
underlying databases, independently of the fact that they are hierarchical,
relational of object based. So why not open the game to RDBMS and ODBMS
vendors ?

> ...AFAIK, the only ways a computer 
> can exchange data with a human being are serial, and I feel that 
> hierarchised text or speeches are the highest form of structured, 
> serialized data that we can understand. 

actually humans are not limited to serial data formats, e.g. images and 
voice which is interpreted by humans in a nonlinear, non serial fashion.  

Well, I'm not a specialist, but I believe that even if image or voice
recognition processes are not serial/linear, your thread of consciousness
is. When I want to have a view of a complex system, I have to read an
article from the beginning to the end. When I look at a schema (say, a
relational conceptual model), I have to read it little by little. In this
case, hierarchical data is way more understandable than a complex graph,
because it has a natural top-down ordering. But maybe that's my own way of
understanding things :)

> 
> C] In-memory data storage and manipulation using XML 
> 
> This last point on presentation is very important to me, as its 
> consequences finally made me to abandon any attempt of modeling 
> data as full-featured objects in the development of presentation 
> layers 

The work on types in schema languages would suggest otherwise. The DOM (L2) 
is limited to a small number of fixed types "element" "attribute" "text" 
.... When type support becomes a part of DOM Level ?.? the DOM does become a

general 'object' model. It is not at all clear to me that representing 
everything in a DOM _saves_ memory.  

The W3C DOM and their current implementations are, indeed, not schema aware,
poor in features (lacks XPath support, for example, except for MSXML's magic
select() method or its equivalent in dom4j), and greedy on memory. But
nobody said that we were forced to use it. In my mail, I used DOM as its
original meaning "Document Object Model", not as a reference to the W3C DOM.

> 
> To save development time, deployment time, and memory (all thoses 
> classes modeling data come at a high price), we chose not to 
> model data as full-featured object, but simply as XML DOM 
> Documents. ... 
> 
> - it saves a lot of memory by removing application-specific 
> classes and replacing it with a small set of classes, the DOM. 

The statement that use of the DOM saves memory is against many common 
beliefs. (Indeed I try to use SAX as much as possible to avoid the DOM 
memory overhead.) Can you support these claims with some data?  

No, because we have to compare comparable things. The SAX processing model
is completely different from the DOM one. If you can afford processing your
data with SAX, do it, because it only consumes a limited amount of memory
based on the depth of your document, whereas DOM uses memory based on the
size of the document.

However, sometimes you can't use SAX, either because you're doing weird
things (like representing a graph in XML through ID/IDREF tricks), or
because it's easier. You can't always send output SAX events in the same
order that you get input SAX events (just ask a XSLT implementor).

> 
> - it saves a lot of time and energy by the sheer flexibility of 
> XML. 

This is where XML wins hands down in my book. 

> 
> - contrary to Java class definitions, XML schemas (or schemata if 
> you prefer :) are quite difficult to read and write. 

And people will complain that XSLT is more difficult to read and write than 
Java, so? 

It is, sometimes. I have to tell that I initially had more difficulties
using the template matching system in XSLT than polymorphism calls in Java.
I'm sorry but I try to develop a framework that I try to be as simple as
possible. This implies, in these times of lack of qualified people, that I
can't possibly ask each and every developer that want to use my framework to
learn XML Schema, XSLT, etc. So I have looked for simplification whenever
possible.

As an example, if you have to transform an XML document whose schema is non
recursive (a fixed-depth tree), you don't need to write XSLT-like
declarative templates, you can write a simple, straightforward, nearly
procedural transformer. You can do this using XSLT, with very simple
stylesheets (using xsl:value-of and xsl:for-each, that's all). You need only
one template, the root template.

You don't have to learn the template matching process, which is very
powerful but not trivial (especially when context-sensitive template are
required). You'll use it later, but you have a softer learning curve. I
know, I've read some articles telling that xsl:for-each is bad and that we
should use templates as often as possible, for extensibility sake (I think
it was in the DocBook XSLT documentation). But hey, all developers are not
rocket scientists that implement a new DocBook stylesheet every morning !

This non-recursive case is quite interesting, as it is more easy to
implements transformers for this specific case. Also, you can quite easily
implement a WYSIWIG editor for stylesheets. If complex template matching is
required, I can't think (but maybe smarter people can) of a simple way to
represent the result of a transformation in real time.

Again, XML and all derived technologies are very powerful tools, created by
very smart people. But using XML in more and more situations means that it
will be used by more and more people, so we have to keep it simple.

> 
> - compile-time checks are not performed. If you call 
> person.setFavoriteColour() on a Person instance, and the Person 
> class has not this method, you will get a compile-time error. 
> Using Java + DOM, a compiler cannot see an error when you try to 
> add the "favoriteColour" attribute or child element to a "person" 
> element. As we have developed a custom XML language compiled to 
> Java code, we feel that it is possible to make the compiler 
> schema aware, thus enabling compile-time checks when the schema 
> of the manipulated documents are known. 

correct on all accounts. in this sense these usages of XML have the same 
issues as all dynamically typed and/or interpreted systems ... and the same 
solutions :-) 

Precisely what we did is a bit like extending Java with special expressions
and operators so that XML document creation/manipulation is more naturally
written. The result is way less powerful than what you can do with clever
XML Query queries, but it's powerful enough, and we have access to the full
range of Java libraries.

> 
> - "this is not pure object oriented programming !" I don't know 
> if it is the same out there, but here in France it matters to a 
> lot of people. 

that's too bad :-/  

That's cartesianism :)

My current 5 seconds answer is that presentation 
> layers usually do not require a pure object oriented model for 
> data. It does not mean that the underlying framework of the layer 
> is not object-oriented, far from it ! 

i'd just say "but of course we use Java so it *is* object oriented"!  

Well, Java being an object-oriented object is not a guarantee that any Java
code is really object oriented... Just like creating a database in a RDBMS
is not a guarantee that it will be 4NF. Anyway, what I meant is that you can
write some nicely object-oriented code that processes non object-oriented
data (in the sense that the meaning of the data is not reifed into a class).

> 
> - "yeah, but then how do I associate a behaviour to my data ?" 

In Texas you'd pull out a shotgun and blast at the general direction of the 
individual. In Boston we mumble some phrases about the "Semantic Web" ... 
"RDDL" ... wave our hands and then walk away :-) 

Well, in France you could not use a fire arm so easily :).

The OOP concepts relies on the fact that objects contained data, but also
behaviour. Technologies like Java can be used to exchange not only data, but
also code, due to the VM abstraction. For example, in JINI a device can
declare itself by registering a service object, which contains data (where
is the device), but also behaviour (how to talk to the device), the service
interface defining the semantic of the device.

By using XML more and more for data exchange, we have to face raw data
without behaviour. The behaviour has to be at one or another (or both) end
of the exchange. We try to make sure that the semantics of the data are
consistant on both ends, using namespaces and schemas. But what about the
behaviour of the objects associated to the data ? For the moment, I'll walk
away, but maybe one day we'll have to think about it.

Regards,

----------------------------------------------------------- 
Nicolas Lehuen 
Responsable R&D - Head of R&D 
Ubicco - Multi Access Software Solutions 
http://www.ubicco.com/ <http://www.ubicco.com/>
application/ms-tnef
Prev by Date: Re: [OT] The stigma of schema
Next by Date: RE: About namespaces
Previous by thread: RE: XML and the Real World
Next by thread: RE: Data storage, data exchange, data manipulation
Index(es):
- Date
- Thread