[
Lists Home |
Date Index |
Thread Index
]
- From: "Mark L. Fussell" <fussellm@alumni.caltech.edu>
- To: xml-dev@ic.ac.uk
- Date: Sat, 22 Nov 1997 01:06:15 -0800 (PST)
This is somewhat related to the recent threads on Integrity and
Inheritance. It is again a bit long so it will be duplicated at MONDO
(www.chimu.com/projects/mondo).
========
I suggest that SGML/XML be perceived as a markup language to describe how
to build information instead of describing (and modeling) the information
itself. This may appear to be a subtle distinction but it has a lot of
implications.
I will start with a recent concrete example from Rick Jelliffe
<ricko@allette.com.au>:
<!ELEMENT citation ( title, text, url)>
This says a citation is composed of (through its content) a title, text,
and url. But do not view that as the information model of a citation;
consider it a recipe for a citation. We can build a citation if we
supply the three (named) ingredients: title, text, and url. The detail
of the resulting information (which I will call an object) is unknown.
It is likely that the citation object will have these three attributes,
but it could have more or it could even discard some of them (in which
case the recipe included information that the model did not need).
If we have a different element that requires more information we could
have a different recipe:
<!ELEMENT DetailedCitation ( title, text, name, text, url )>
The object that results from this recipe might be the same type as a
citation object, a subtype of the citation object (i.e. treatable as a
citation object but has more capabilities), or even an unrelated type of
object. For the moment we will abstain on discussing anything about the
objects resulting from the DetailedCitation and the Citation recipes [why
I started capitalizing will be explained later too].
What about combining the two recipes into a single element? We could
combine them as:
<!ELEMENT Citation ( ( title, text, url) | (title, text, name,
text, url) )>
<!ELEMENT Citation ( title, (text, name)?, text, url )>
<!ELEMENT Citation ( title, text, (name, text)?, url )>
This would be ambiguous (in SGML terms) for the first two but all of
them are bad recipes. They are bad because we (or the computer) must
look at all the content to know which version we are using. This is
analogous to reading a whole recipe before we can be sure what we are
trying to make. It would be better to more clearly separate the options
from the requirements if you choose that option. Our original version
separated the recipes through the elements:
<!ELEMENT Citation ( title, text, url)>
<!ELEMENT DetailedCitation ( title, text, name, text, url )>
We could also do this with:
<!ELEMENT Citation ( basicInfo & detailedInfo? )>
<!ELEMENT basicInfo ( title, text, url)>
<!ELEMENT detailedInfo ( text, name)>
or:
<!ELEMENT Citation ( basic | detailed )>
<!ELEMENT basic ( title, text, url)>
<!ELEMENT detailed ( title, text, url, text, name)>
In these forms it is explicit what we are trying to build (or at least
the complexity is dramatically reduced). We do not have to look into the
details of the information itself.
RECIPES
=======
Now I will ask for a leap of faith.
Consider separating ELEMENTs between Recipes that build objects and
Parameters that name the ingredients that are required for a particular
recipe. As an architectural-form it would look like this:
<!ELEMENT Recipe (parameter)*>
<!ELEMENT parameter (Recipe)>
Although in the content model parameters are sequential, their order is
insignificant semantically. Each parameter must have a unique name, so
consider them to be and-ed together instead of seq-ed. Sort of like:
<!ELEMENT Recipe (parameter)&*>
or like required element attributes.
As a convention I will capitalize the Recipes and keep parameters in
lowercase. Now returning to our example, to build a Citation required
three parameters:
<!ELEMENT Citation ( title & text & url)>
The original ordering of the parameters is irrelevant to the
informational content because each parameter is uniquely named, it is
only a presentation/encoding restriction to have them be sequential.
Also, the parameters do not describe the Types of the ingredients, just
the Role of them in building the recipe. All of 'title', 'text', and
'url' could be simple strings:
<!ELEMENT title (String)>
<!ELEMENT text (String)>
<!ELEMENT url (String)>
<!ELEMENT String (#PCDATA)*>
Or any of them could have a more complex type. By separating the two
types of elements we can
Be very explicit about what we are constructing
Have a great deal of flexibility for reuse of elements
Use very simple content models that produce complex structures
Note that although the '&' is considered complex to implement, this
particular use of it has the same form as attributes: Parameters are
unordered and possibly required.
Shortcuts
---------
You might have noticed that String cheats: a String does not follow the
required Recipe pattern of having only parameters in content. This is a
convenience shortcut Recipe [OK, and an insanity prevention device],
which makes putting strings of text into this format more easily.
Similarly we will probably need to have a shortcut for Lists (sequences)
of objects:
<!ELEMENT List (Recipe)*>
With these additions we have to modify our original description of the
architectural-form of Recipes to:
<!ELEMENT Recipe (parameter)*>
<!ELEMENT StringRecipe (#PCDATA)*>
<!ELEMENT ListRecipe (Recipe)*>
<!ELEMENT parameter (Recipe | StringRecipe | ListRecipe )>
Recipes, DTDs, and DomainModels
-------------------------------
Each Recipe builds an object. What is the type of this object and how
does it relate to the ELEMENT content model? I propose (and agree with
others proposing) that there should be no required connection between the
rules of a recipe (the DTD) and the rules of the DomainModel objects
built from that recipe. Objects can have far more complex relationship
rules than DTDs can describe and the DTD will either over-constrain or
under-constrain the built objects.
Instead consider the DTD as similar to a UI Form. You may want to place
things in a particular order and group them together:
Person
FirstName LastName
SSN
Children
FirstName LastName
But this is a presentation of the (view independent) information model
that has a person with several attributes and associations in no
particular order (even children do not need to be explicitly ordered for
orderings can be derived from [for example] the child's birthdate). The
UI/DTD can place constraints (like a SSN has a 123-45-6789 format) but it
should be very careful about these constraints (what about 99- SSNs) or
really delegate the responsibility of validation to the DomainModel. But
simplified views are still useful.
DTDs can still be used to produce an information model but it should be
possible to unlink the information model and have it start a more robust
life of its own (or the dependency reversed). The Recipes should still
be useful because they encode the knowledge required to build the
information independently of how precisely or extensively it is modeled
(up to a point). The recipes can live on as the model grows.
And, in a strange circularity, information models are also (obviously)
information so they can again be encoded as recipes in SGML/XML and used
as metadata for the domain model. So although DTDs are not good
information models, there is nothing stopping SGML/XML from being a good
encoding for good information models.
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|