Lists Home |
Date Index |
- From: Sean McGrath <email@example.com>
- To: firstname.lastname@example.org
- Date: Mon, 06 Mar 2000 11:22:42 +0000
Sorry for the repost but there are some errors
in the text which throws the whole thing out
of whack. Not enough caffiene. Please accept
Now back to normal transmission...
[Sean Mc Grath]
>> Oh how I wish the real world were so simple,
>> that a basic mapping system sufficed to
>> get real work done.
>Then you are missing the point of architectures. The point is not to
>enable transformations (although they can be used that way in some
>cases). The point is to enable a document to *declare* what higher-order
>things it conforms to in order to enable validation against those
>higher-order things (e.g., I am a valid HyTime document and you can
>prove it) and to enable processors to easily *recognize* that certain
>elements and attributes are mapped to semantic constructs they know how
>to process. That is, it lets you capture knowledge about how a document
>relates to some more general taxonomy of types in the document itself
>rather than in the code that processes the document. That makes the
>knowledge generally available to *any* processor, rather than only to
>the code that embodies the knowledge.
I see what you are saying but declararing the relationship
to "higher-order things" is tantamount to expressing
a *transformation* to a DTD expressed in terms of those
higher order things.
Question: What is difference between these two:
1) Using an architectural DTD as a standardized way of
a) "higher order things" (architecture DTD)
b) The transformation of instances from source
DTDs to the architecture DTD
2) Using an ordinary DTD to express "higher order things"
and using XSLT to express the transformation to this
higher order DTD
I claim that (2) is every bit as declarative and
open as (1) and builds on existing tools. Moreoever,
(2) scales to more complex transforamtions to
"higher order things" thanks to the expressive
power of XSLT.
>If your XML work mostly involves applying to specialized processing to
>documents over which you have complete control, then architectures may
>add little additional value. But if you are applying processing to a
>large set of documents over which you have little or no control (e.g.,
>the generic HyTime or XLink or Topic Map scenario) then architectures
>are the *only possible solution* that I've been able to think of.
>However, my experience is that there are few non-trivial XML problems
>that cannot be solved more effectively by using architectures *in
>addition to* our traditional set of tools.
>Architectures also provide an effective way to formally capture design
>knowledge at different levels of specificity, which is vitally important
>when building up systems of related document types within an enterprise
I agreed that it is vitally important. My question is why not use
an ordinary DTD to capture the "higher order things" and XSLT to
express the design knowledge of how to go from
"specific" to "general"?
>Also, when I teach architectures I try to be clear that architectures do
>not, as Sean points out, solve every processing problem. Far from it,
>and for those problems something like XSLT is the ideal solution. But
>there is a significant class of problems for which architectures are
>either just the right solution or an important and cost-effective
>component of a larger solution.
>Here is an immediate and practical example from my current project with
>the State of Texas: One of our challenges is managing the markup for the
>daily House and Senate journals (the Texas Legislature is a traditional
>American bicameral body). On their face journals are very simple, but,
>we discovered that in fact they are quite complex *if* you try to
>capture the details about each different legislative order of business
>described in the journal. We decided we did want to do this in order to
>enable the automatic generation of indexes and to help drive authoring.
>The problematic side effect of this is that each House and Senate DTD
>has about 740 element types: 200+ types for each distinct order of
>business (OOB), plus two unique subelements for each OOB (plus assorted
>other stuff to fill out the other 100 or so types).
>Now, imagine the practical problem of writing a complete style sheet for
>these DTDs: no less than 740 separate element in context rules, likely
>two or three times that because of contexts. Such a style sheet becomes
>almost impossible to create and manage.
Anybody who tries to solve this with 740 element in context
rules needs to take a course called
"Stand back and think every now and then: 101"
The way to solve this is to do an initial transformation
that groups elements with identical formatting semantics
under a single element type name, thus dramatically
shortening the stylesheet.
This is why XSL and DSSSL before it have transformation
as well as styling components!...
I believe that this "grouping" is exactly what you have
done using an AF approach.
>If you generalize up one level, you discover that there are in fact only
>about 25 distinct *types* of element types. These 25 types also map
>pretty much one-to-one to the formatting distinctions needed to render
Hmmm indeed. The next thing to do I would suggest is invent
an intermediate representation, generalized up a level so that
the 25 *types* are explicit rather than all the separate
element type names that share identical formatting semantics.
No problem. A transformation that preceeds the formatting.
Bread and butter stuff....
>If we capture these 25 types as an architecture, we get the following:
>1. A clear, formal definition of these essential differences,
>uncluttered by the noise of the 740+ specialized element types.
>2. A simple syntactic structure to hook generalized processing to (an
>attribute whose value will be one of the 25 types)
>3. A clear definition, *within the journal DTDs*, of the basic type each
>specialized type maps to, making the DTD more self describing and making
>the knowledge of the binding generally available.
>So we did this. After adding the "journal architecture" mapping to the
>journal DTDs, it became possible to write simple programs like the one
>shown below, which generates an ADEPT*Editor style sheet (FOSI). This
>program is trivially easy to maintain, in sharp contrast to the
>alternative described above. The indirection of the architecture, which,
>as Sean says, really isn't very involved, gives us a huge amount of
>leverage. I would estimate that adding this architecture to the DTDs has
>already saved me personally 40-80 hours in time lost due to the sheer
>number of element types in these two DTDs.
>The Python program that generates the FOSI is shown below in its
>entirety (it uses GroveMinder to access the DTD, but you could use any
>similar mechanism for accessing the DTD properties). Note that most of
>the script is the text of the output FOSI, not the code that implements
>the mapping. Note too how easy it is to detect and act on the
[Cool Python program elided]
I guess I distill my questions about this into one
What is the difference between what you have done and
an initial XSLT expressed transformation followed
by your Python program?
XML -> XSLT -> Python Program -> FOSI
XML -> Architecutral Parse -> Python Program -> FOSI
>I could have done the same thing by hard-coding the mapping to the
>journal architecture in the script (which is the implication of Sean's
It does not follow that the mapping is hard-coded in my approach.
I love externalizing to declarative syntaxes where prudent.
In this case my best option for a "mapping" is XSLT as
it holds out the prospect of specifying mappings that
are a lot more complex than simple one-to-one mappings.
Non-trivial mappings are what make the world go
around in my experience!
> but that would have bound the knowledge that mapping
>represents into the script, rather than making it generally available to
>any processor, which is what the architectural mapping does.
As would an XSLT based expression of the mapping.
>It's really just a question of where you bind your knowledge about the
>data: in the code that processes it or in the data itself. I always
>prefer to bind my knowledge to the data.
I don't like binding data knowledge into processing code
either. That is why I am always on the look out for
the 80/20 point between declarative and turing complete
modes of expression.
I see what you have done with the Texas Legislature
and agree with the reasoning. I just don't see that
doing it via AFs rather than via an intermediate
DTD/XSLT has bought you anything other than a whole
bunch of constraints you would not have had
if you had gone for the intermediate DTD
http://www.pyxie.org - an Open Source XML Processing library for Python
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:email@example.com&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/