Lists Home |
Date Index |
as usual i'm watching a couple of threads with inadequate time to
respond or contribute. but here's a few ideas that apply to this thread
and the data thread running at the moment:
1. xml originally seems to have been a self describing document -
although dtd's were always there, but optional
2. at some point that was inadequate and various schema mechanisms
developed in an attempt to be more robust
3. this parallels some of the early database work. rdbms without some
sort of data dictionary is meaningless, you just have flat files. this
is the parallel to xml/xsd or xml/rng
4. rdbms and xml are both poor at representing connections between data
(we call them associations). rdbms use sql (select ... where ...) while
xml is trying to use owl.
5. for any of this to work you need an external method to describe
associations - for my money i would add to the schema a relationship
clause that says something like "the relationship between this schema
and schema X is based on the value of element Y". we declare
associations between relations that way and it works well, i can't see
why it wouldn't work well for xml. this assumes we are dealing with sets
of documents conforming to a schema.
6. this of course does have some implications for external data and
synchronisation. relation associations work efficiently because they
define necessary indexes on the tables. xml associations would also
imply some sort of indexing for all documents that conform to a
particular schema. and that in turn would enable efficient association
of information, without using structural tricks or hardwired pointers.
it may even mean that schemas can be simplified and normalised.
7. even though we seem to use ontologies to survive as people, for the
most part i don't think an ontology as such is as important to
data/document handling as the the ability to express the existence of a
relationship and the elements used to form the relationship. many
ontologies at any rate are clearly awkward as names are given to all
things, even when they are trivial ( an order contains order lines eg)
8. the ultimate convergence of xml and data could be achieved by using
meta data describing the relationship between schemas, normalising
schemas, accepting that null values in database attributes are really
just a device to maintain the "regularity" of the form, and could be
simply missing if there was another way to describe a relation (tuple) -
xml does this.
so the future? more work on normalised schemas (not documents, although
that is implied), a way to express relationships (associations) between
schemas, not discussed but a way to express projections of documents
would be good too. external storage mechanisms (such as document
indexing) should remain the domain of product developers rather than
standards - as it has in the database world.
Hunsberger, Peter wrote:
>Gavin Thomas Nicol <email@example.com> writes:
>>On Oct 25, 2004, at 12:21 PM, Hunsberger, Peter wrote:
>>>>I've seen all kinds of data, including graphs, encoded in
>>>>as I have seen such data structures encoded in ASCII.
>>>Sure, but with an XML representation of a graph you're back to the
>>>application to parse the XML serialization into a graph.
>>>exchanged a graph, you've exchanged something that, given
>>>knowledge, someone else might be able to build a graph out of.
>>You have to do that anyway (unless you're using shared memory
>Well, you have to serialize and de-serialize, yes, but you may have
>better ways of portraying graph structure. In particular, id and idref
>gets a little painful if you're trying to do a lot of many to many
>mappings; you really want to normalize out the groupings of idrefs and
>use some explicit form of sub-graphs. XML get's fragile very quickly
>when you've got multiple paths through the network, picking the right
>path for any given context requires extra meta-metadata that is hard to
>manage. Perhaps an example:
>Our application is built around a lot of pseudo-graph traversal (no
>formal properties are tested for or explicitly exploited) of multiple
>XML instances. We depend on naming conventions to map/join across the
>various XML instances (save us copious id/idref mappings). If we didn't
>control the metadata that generates the instances it all would be very
>fragile. As it is, we're constantly running into cases where the users
>come up with use-cases that stretch our abilities to manage the
>relationships. If we had to do this with externally sourced XML I'm not
>sure how much of the capabilities we could expose, a WAG might be around
>50% before things just blow up so constantly that there'd be no point in
>I suspect if you really have to do the management of such structure
>across multiple independent domains then ontologies are the way to go.
>And if you've got to work with ontologies then some external mapping of
>them onto document structure also seems natural to me. That's no longer
>XML as far as I can tell?
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To subscribe or unsubscribe from this list use the subscription
tel;cell:+61 411 287 530