[
Lists Home |
Date Index |
Thread Index
]
- From: Steve Schafer <pandeng@telepath.com>
- To: xml-dev@xml.org
- Date: Sun, 06 Feb 2000 11:02:53 -0600
Strictly speaking, I suppose that groves are somewhat off-topic for
this group, but since there is already a lot of discussion about them
here, and since the small group of people who have a lot of
understanding of and experience with groves seems to be well
represented here, I'll give it a shot:
I was rereading some old material on groves, and came across the
following in a post by Eliot Kimber to comp.text.sgml (it was at the
end of a paragraph discussing the definition of customized property
sets for various kinds of data; the full context is available at
http://www.oasis-open.org/cover/grovesKimber1.html):
"However, there is no guarantee that the property set and grove
mechanism is capable of expressing all aspects of any notation other
than SGML."
(Notes 440 and 442 in section A.4 of the HyTime spec say much the same
thing.)
On the face of it, this is a perfectly sensible thing to say. At the
same time, however, it is rather disturbing, because it suggests that
there might exist data sets for which the grove paradigm is wholly
unsuited. I would certainly hate to expend a lot of effort building a
grove-based data model for a data set, only to discover part way
through that groves and property sets simply won't work for that data
set.
In the world of computing, we can rest easy knowing that there exists,
at least conceptually, a Universal Turing Machine, and that such a
machine, given an appropriate program, is capable of computing
anything that is computable.
So the first question is this:
1) Does a Universal Data Abstraction exist?
Note that, like a Universal Turing Machine, such an abstraction need
not be particularly efficient or otherwise well suited to any specific
task. The only requirement is that it be universal in the sense of
being capable of representing any conceivable data set (or at least
any "reasonable" data set). (And no, I don't have a formal definition
of what "reasonable" would mean in this context; all I can say is that
the definition itself should be reasonable....) The real importance of
a Universal Data Abstraction is that it would provide a formal basis
for the construction of one or more Practical Data Abstractions.
Assuming that the answer is "yes" (and I have no real justification
other than optimism to believe that it is), the second question
follows immediately:
2) Does the grove paradigm, or something similar to the grove
paradigm, constitute a Universal Data Abstraction?
If one is feeling contrary, it would be easy to answer "no" to the
second question by providing an example that answers the third
question in the affirmative:
3) Does there exist any "reasonable" data set for which the grove
paradigm inherently cannot provide an adequate representation?
When attempting to answer this third question, it is important to
avoid getting caught up in unwarranted toplogical arguments. The
topology of groves may not map onto the topology of a particular data
set, but that does not mean that that data set is unrepresentable as a
grove. Consider XML: An XML document consists of a linear, ordered
list of Unicode characters, yet the XML format is quite capable of
representing any arbitrary directed acyclic graph.
========
On a somewhat related note, I've noticed that in discussions regarding
the Power of Groves, the arguments by the proponents seem to fall into
two distinct groups. On the one hand, some people see groves as being
quite universal in their applicability. On the other, some people talk
about groves almost exclusively within the context of SGML, DSSSL
and/or HyTime. As an outsider and relative latecomer to the party, I
find it difficult to determine whether this dichotomy of viewpoints is
real, or merely reflects the differences in the contexts in which the
discussions have taken place. If the schism _is_ real, it would be
helpful if those sitting on either side of the fence could add their
thoughts regarding why the schism is there, and why the people on the
other side are wrong. :)
An example of why I am concerned by this question is given by the
property set definition requirements in section A.4 of HyTime. The
definition of property sets is given explicitly in terms of SGML. That
is, a property set definition _is_ an SGML document. But it seems to
me that if property sets have any sort of widespread applicability
outside of SGML, then a property set definition in UML or IDL or some
other notation would serve just as well (assuming that those other
notations are sufficiently expressive; I'm fairly confident that UML
is, but I'm not so sure about IDL).
Of course, it can be argued that _some_ notation had to be used, so
why not SGML? My response to that is that I believe that the
mathematical approach of starting with a few extremely basic axioms
and building on those as required to develop a relevant "language" for
expressing a model would be far superior, as it would allow people to
fully visualize the construction of the property set data model (or
"metamodel," if you prefer), without getting bogged down in arcane
SGML jargon. After all, SGML can hardly be described as minimalist.
(An aside: I believe that a lot of the resistance to acceptance of
SGML and HyTime has its basis in the limitation of identifiers to
eight characters, leading to such incomprehensible abominations as
"rflocspn" and "nmndlist." Learning a completely new body of ideas is
hard enough without having to simultaneously learn a foreign--not to
mention utterly unpronounceable--language.)
This situation with property set definitions reminds me of the recent
discussions in this group regarding the chicken-and-egg relationship
between the XML notation and the XML data model. The absence of a
pre-existing data model for XML leads to a scenario in which everyone
who uses XML builds their own mutually-slightly-incompatible data
models. While I can't prove that the same has happened or will happen
with property set definitions and other related aspects of the grove
paradigm, I think such a thing is certainly plausible. As with XML,
the question boils down to what is more fundamental, the notation or
the data model.
-Steve Schafer
|