xml-dev - Re: Question about Architectures and Versioning

Re: Question about Architectures and Versioning
[ Lists Home | Date Index | Thread Index ]
From: "Steven R. Newcomb" <srn@techno.com>
To: andrewl@microsoft.com
Date: Sat, 13 Jun 1998 17:42:45 -0500

[Andrew Layman:]

> In the example below, when you give the amended V2 architecture, is
> there a typo in which you call this several times the "V1 amended"
> architecture, or am I reading things incorrectly?

Yeah, sorry about that.  It's a bit confusing because of what's really
going on here (see below further explanation).  When I said:

> > **************************************
> > ** Parsing I2 against V1 as amended **
> > **************************************
> > If we parse I2 against V1, as amended, we get:

...I should have said:

> > ***********************************************************
> > ** Parsing I2 against V1 via the amended V2 architecture **
> > ***********************************************************
> > If we parse I2 against V1 via the amended V2 architecture, we get:

A possible source of confusion is the fact that the V2 architecture
specifies that the V1 architecture is a "base" architecture with
respect to the V2 architecture, which is "derived" from (or is a
"client" of) V1.  The V2 architecture does not specify itself in any
way; it *is* the V2 architecture.  At the risk of belaboring the
obvious, let me say that the only things that specify the V2
architecture are:

* document instances that are clients of the V2 architecture, and

* architectures that regard the V2 architecture as a base
  architecture.

The structure of such V2 client instances and architectures are
defined by the V1 architecture through the lens (so to speak) of the
V2 architecture.

> I want to study your example carefully, and want to be sure that all
> the V1s and V2s are exactly as you intend them.  Thanks.

Far as I know, these are correct.  I was pretty careful.  (That
doesn't mean we won't discover errors.)  And thank you for your
attention; it makes me feel like all this effort is not wasted.

> Regarding which meta-DTD we would want to use, I'd want to be able
> to use a V1 meta-DTD, without modification, against a V2 instance.

If (as is the case in our example) one architecture is derived from
another, we will need to use the derived architecture as a map to
understand our clients in terms of the base architecture.  

If, on the other hand, our client instance directly specifies two base
architectures, then either of the corresponding meta-DTDs can be used
directly against the instance, thus meeting your requirement "to be
able to use a V1 meta-DTD, without modification, against a V2
instance".  (See new example below: "Two Directly Specified Base
Architectures".)

Either way, all the relevant meta-DTDs collectively drive a standard
generic parsing process, so we don't need to use any
architecture-specific software to create architecture-specific parse
trees.  (If we did, the economics of AFs would make no sense.)  I'm
trying to show you that AFs provide a way to view older documents
through newer architectures, and newer documents through older
architectures, gracefully, simply, and reliably, without too much
ugliness, syntax, or software.  Let me explain architectural parsing
in a step-by-step fashion.

Here's the final version of the V2 client (again):

<!-- instance #I2 -->
<?IS10744:arch name="V2"  ... ?>
<Mydoc>
    <Book>
        <Title>Gone With the Wind</Title>
        <Author>
            <Person>
                <Firstname>Margaret</Firstname>
                <Lastname>Mitchell</Lastname>
            </Person>
        </Author>
    </Book>
</Mydoc>

Note that it's a V2 client, and not a V1 client.  If it were a V1
client, it would say so.  Indeed, there is no direct evidence here
that V1 has anything to do with this document.  (It could declare any
number of base architectures, of course.  See "Two Directly Specified
Base Architectures" below.)

However, a standard, architectures-aware parser will gain access
to the V2 architecture's meta-DTD...

<!-- the V2 architecture, as amended -->
<?IS10744:arch 
   name="V1" 
   dtd-public-id="-//Andrew Layman//DTD The V1 Architecture//EN"
   ignore-data-att="V1IgnoreData"
>
<!ELEMENT V2 - - (Book)>
<!ELEMENT Book - - (Title?, Author?)>  
<!ELEMENT Title - - (#PCDATA)>
<!ELEMENT Author - - (Person)>
<!ATTLIST Author
    V1IgnoreData  CDATA  "ArcIgnD"
>
<!ELEMENT Person - - (Firstname, Lastname)>
<!ELEMENT Firstname - - (#PCDATA)>
<!ELEMENT Lastname - - (#PCDATA)>

...at which time the parser discovers that the V2 architecture is a
client of the V1 architecture.  So, the parser gets the V1
architecture's meta-DTD...

<!-- the V1 architecture -->
<!ELEMENT V1 - - (Book)>
<!ELEMENT Book - - (#PCDATA)>

...which is not a client of any other architecture.  Thus, the parser
knows that there are three (conceptual) groves that are extractable
from this document: a grove of the instance itself, a grove from the
perspective of the V2 architecture, and a grove from the perspective
of the V1 architecture.

Suppose we are running an application that only knows how to deal with
V1 client instances, and it never heard of V2 architecture.  So how
can it know that V2 client instances are processable by it?  The
answer is that it can't, unless the application was originally
designed to be brought online iff a V1 grove appears as a result of
parsing a document.  It follows that documents should be processed on
the basis of what groves turn out to be extractable from them, rather
than on the basis of what architectures they declare explicitly.  In
my mind, this application design rule illustrates a strength of the AF
paradigm, not a weakness.  It means that documents are always as
interpretable as they can be, given local semantic processing
resources, regardless of who created the instances, or who created
their meta-DTDs, or what other meta-DTDs those meta-DTDs were derived
from, or how they were (validly) mixed together.  Thus, AFs allow
complete decentralization of architectural authority, while providing
for perfect reusability of all architectures.  (There are other
features of AFs, which we haven't talked about yet, that permit
syntactic conflicts between architectures to be resolved in client
architectures and client instances.)

Digression: It is important that we embrace the AF paradigm (or
            something mighty similar) sooner rather than later.  The
            longer we wait, the more of today's information will not
            be able to participate in tomorrow's mainstream, in which
            many constantly-evolving systems of semantic markup will
            routinely appear within single documents.  How long do we
            want to exclude Web documents from the mainstream of
            civilization's lifeblood, and civilization's lifeblood
            from the Web?  And how do we explain the reason for the
            delay?


*************************************************
** "Two Directly Specified Base Architectures" **
*************************************************

OK, enough soapboxing.  Here's another way to accomplish the same
goal.  The difference is that in the below example, the instance is a
direct client of both the V1 and V2B architectures, and the V2B
architecture is not a client of the V1 architecture.  I made some
other changes just to demonstrate the fact that the AF paradigm does
not tread on the application's generic identifier namespace; this
means not taking advantage of the automatic name mapping feature, and
therefore, there is much more verbosity in the client instance.


*************************
** The V1 Architecture **
*************************
<!-- the V1 architecture, same as always -->
<!ELEMENT V1 - - (Book)>
<!ELEMENT Book - - (#PCDATA)>


*************************
** The V2B Architecture **
*************************
<!-- the V2B architecture. No base architecture. -->
<!ELEMENT V2B - - (Book)>
<!ELEMENT Book - - (Title?, Author?)>  
<!ELEMENT Title - - (#PCDATA)>
<!ELEMENT Author - - (Person)>
<!ELEMENT Person - - (Firstname, Lastname)>
<!ELEMENT Firstname - - (#PCDATA)>
<!ELEMENT Lastname - - (#PCDATA)>


*****************
** Instance I2B **
*****************
<!-- instance #I2B. Two base architectures. -->
<?IS10744:arch name="V1" ... ?>
<?IS10744:arch name="V2B" ... ?>
<Mydoc>
    <MyBook V2B="Book">
        <MyTitle V1="Book" V2B="Title">Gone With the Wind</MyTitle>
        <MyAuthor V2B="Author">
            <MyPerson V2B="Person">
                <MyFirstname V2B="Firstname">Margaret</MyFirstname>
                <MyLastname V2B="Lastname">Mitchell</MyLastname>
            </MyPerson>
        </MyAuthor>
    </MyBook>
</Mydoc>


Parsing I2B against V1, directly, without reference to V2B, we get:

<V1>
    <Book>Gone With the Wind</Book>
</V1>

At this point, you may ask, "What happened to 'Margaret Mitchell'?,"
because no ignore-data-att attribute appears in the above example.
It turns out that we don't need one.  The PCDATA would have had to
appear after the <Book> element...

<V1>
    <Book>Gone With the Wind</Book>MargaretMitchell
</V1>

...and the V1 architecture doesn't allow PCDATA there.  Because the
default effective value of the ignore-data-att is "cArcIgnD"
("conditional architecture ignore data"), data that appears where it's
not allowed is ignored.  So, I guess we can truthfully say that
Margaret Mitchell is "Gone with the Ignored Data".



Parsing I2B against V2B, we get:

<V2B>
    <Book>
        <Title>Gone With the Wind</Title>
        <Author>
            <Person>
                <Firstname>Margaret</Firstname>
                <Lastname>Mitchell</Lastname>
            </Person>
        </Author>
    </Book>
</V2B>

(i.e., no surprises)

I think the above example would be more realistic if we weren't
thinking in terms of V2 being a revision of V1.  If they were
completely independent architectures, then the above would make more
sense and be more dramatic.  Anyway, this example meets your criterion
of being able to parse the instance directly from the V1 meta-DTD,
without involving the V2 meta-DTD at all.

Since V2 is a revision of V1, we would normally not want to require
users of the V2 architecture to mark up their documents in terms of
the V1 architecture as well as the V2 architecture; we would expect
users to declare V2 and we would expect V1 software to be able to
comprehend V2 documents to the same extent that it could comprehend V1
documents, as shown in the original example in which V2 is derived
from V1.


--Steve

Steven R. Newcomb, President, TechnoTeacher, Inc.
srn@techno.com  http://www.techno.com  ftp.techno.com

voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137)
fax    +1 972 994 0087 (at ISOGEN: +1 214 953 3152)

3615 Tanner Lane
Richardson, Texas 75082-2618 USA



> -----Original Message-----
> From: Steven R. Newcomb [mailto:srn@techno.com]
> Sent: Friday, June 12, 1998 4:01 PM
> To: Andrew Layman
> Cc: xml-dev@ic.ac.uk
> Subject: Re: Question about Architectures and Versioning
> 
> 
> > From: Andrew Layman <andrewl@microsoft.com>
> > 
> > How does one go about using Architectures to solve the following problem.
> >  
> > Suppose in version one of my documents, I have instances that look like
> >  
> > <Book>Gone With the Wind</Book>
> >  
> > In version 2, I have instances that look like
> >  
> > <Book>
> >     <Title>Gone With the Wind</Title>
> >     <Author>
> >         <Person>
> >             <Firstname>Margaret</Firstname>
> >             <Lastname>Mitchell</Lastname>
> >         </Person>
> >     </Author>
> > </Book>
> > 
> > How do I write my architectures so that the V2 instance is mapped to
> > the V1 architecture?
> 
> Andrew --
> 
> You've asked a good question.  I think it has a good answer.  In order
> to explain this, I have to define the V2 and V2 architectures, and
> turn your example fragments into complete documents.  Then I'll
> discuss what problems arise, and what to do about them.
> 
> 
> *************************
> ** The V1 Architecture **
> *************************
> <!-- the V1 architecture -->
> <!ELEMENT V1 - - (Book)>
> <!ELEMENT Book - - (#PCDATA)>
> 
> 
> *************************
> ** The V2 Architecture **
> *************************
> <!-- the V2 architecture -->
> <?IS10744:arch 
>    name="V1" 
>    dtd-public-id="-//Andrew Layman//DTD The V1 Architecture//EN"
> >
> <!ELEMENT V2 - - (Book)>
> <!ELEMENT Book - - (Title?, Author?)>  
>         <!-- note: auto name mapping is on, so elements of the above type
>              will be regarded as conforming to the V1 <Book> architectural
>              form -->
> <!ELEMENT Title - - (#PCDATA)>
> <!ELEMENT Author - - (Person)>
> <!ELEMENT Person - - (Firstname, Lastname)>
> <!ELEMENT Firstname - - (#PCDATA)>
> <!ELEMENT Lastname - - (#PCDATA)>
> 
> 
> *****************
> ** Instance I1 **
> *****************
> <!-- instance #I1 -->
> <Mydoc>
>     <Book>Gone With the Wind</Book>
> </Mydoc>
> 
> 
> *****************
> ** Instance I2 **
> *****************
> <!-- instance #I2 -->
> <Mydoc>
>     <Book>
>         <Title>Gone With the Wind</Title>
>         <Author>
>             <Person>
>                 <Firstname>Margaret</Firstname>
>                 <Lastname>Mitchell</Lastname>
>             </Person>
>         </Author>
>     </Book>
> </Mydoc>
> 
> 
> ***************************
> ** Parsing I1 against V1 **
> ***************************
> If we parse I1 against V1, we get a grove that, if it were
> re-expressed in XML, would look like this:
> 
> <V1>
>     <Book>Gone With the Wind</Book>
> </V1>
> 
> I.e., No problem.  (And no surprise.)  Note that the
> document element has automatically become the document
> element of the architecture.
> 
> 
> ***************************
> ** Parsing I2 against V2 **
> ***************************
> If we parse I2 against V2, we get:
> 
> <V2>
>     <Book>
>         <Title>Gone With the Wind</Title>
>         <Author>
>             <Person>
>                 <Firstname>Margaret</Firstname>
>                 <Lastname>Mitchell</Lastname>
>             </Person>
>         </Author>
>     </Book>
> </V2>
> 
> I.e., again, no problem.  (And, again, no surprise.)
> 
> 
> ***************************
> ** Parsing I2 against V1 **
> ***************************
> If we parse I2 against V1, taking no other measures, we get:
> <V1>
>     <Book>Gone With the WindMargaretMitchell</Book>
> </V1>
> 
> Clearly, this is a mess, but it illustrates the principle that, by
> default, markup that does not belong in a given architecture simply
> disappears, from the perspective of that architecture.  What to do
> about the mess, though?
> 
> It's reasonable to assume that the person who writes the V2
> architecture intends for V2 documents to be usable with V1 browsers
> (or other applications equipped with V1 engines).  In other words, we
> want the title of the book to become the content of the <Book>
> element, as was the case in the V1 architecture, and we want Margaret
> Mitchell's name to disappear, since the V1 architecture made no
> provision for an author's name.  This can be done as follows:
> 
> <!-- the V2 architecture, as amended -->
> <?IS10744:arch 
>    name="V1" 
>    dtd-public-id="-//Andrew Layman//DTD The V1 Architecture//EN"
>    ignore-data-att="V1IgnoreData"
> >
> <!ELEMENT V2 - - (Book)>
> <!ELEMENT Book - - (Title?, Author?)>  
> <!ELEMENT Title - - (#PCDATA)>
> <!ELEMENT Author - - (Person)>
> <!ATTLIST Author
>     V1IgnoreData  CDATA  "ArcIgnD"
> >
> <!ELEMENT Person - - (Firstname, Lastname)>
> <!ELEMENT Firstname - - (#PCDATA)>
> <!ELEMENT Lastname - - (#PCDATA)>
> 
> Note that we have declared that the name of the "Architecture Ignore
> Data Attribute" for the V1 architecture is "V1IgnoreData".  When this
> attribute appears on an element instance, its value controls whether
> the ultimate data content of the element will be regarded as part of
> the document, from the perspective of this architecture.  We have also
> declared, above, that the V1IgnoreData attribute has a default value
> of "ArcIgnD" on instances of the <Author> element.  This means that,
> from the perspective of the V1 architecture, the data content of
> the <Author> element, and the data contents of all of the elements
> that it contains, will be ignored (will disappear).
> 
>   Digression: The possible values of any "architecture ignore data
>               attribute" are:
> 
>               ArcIgnD  : Data is always ignored.
> 
>               nArcIgnD : Data is not ignored, and it is an error if
>                          data occurs where the architecture does not
>                          allow it.
> 
>               cArcIgnD : Data is conditionally ignored (data will be
>                          ignored only when it occurs where the
>                          architecture does not allow it.)
> 
>               The default value is taken to be cArcIgnD.
> 
> 
> **************************************
> ** Parsing I2 against V1 as amended **
> **************************************
> If we parse I2 against V1, as amended, we get:
> <V1>
>     <Book>Gone With the Wind</Book>
> </V1>
> 
> 
> Q.E.D., right?
> 
> 
> 
> 
> ***************************
> ** Parsing I1 against V2 **
> ***************************
> If we parse I1 against V2, taking no other measures, we get:
> 
> <V2>
>     <Book></Book>
> </V2>
> 
> What happened to the title of the book?  It disappeared because the
> default value of the ignore-data-att is "cArcIgnD", which means that
> when data is not allowed in the content of an element, it will be
> ignored.  The V2 architecture does not permit #PCDATA in the content
> of <Book> elements, so the data "Gone With the Wind" disappeared
> automatically.  If we don't want the data to be ignored, we can force
> the data to appear by setting V2IgnoreData to "nArcIgnD".  However,
> making the data appear where it's not allowed to appear will create a
> parsing validation error, so, if we really need to use the same
> meta-DTD for both V1 and V2 documents (we don't), this
> solution is not so good.
> 
> If we must use the same meta-DTD for both older V1 documents and newer
> V2 documents, in order to maintain the upward compatibility of older
> V1 documents it would be best, when creating the V2 architecture, to
> anticipate this problem as follows:
> 
> (1) Allow #PCDATA in the content of V2 <Book> elements, in addition to
>     the <Title> and <Author> elements, and
> 
> (2) Provide instructions to V2 application developers (in the V2
>     Architecture Definition Document [ADD]) indicating that V2
>     application engines must expect #PCDATA in <Book> instances, and
>     that they must treat such data content as if it were in a V2
>     <Title> element.  The ADD might also advise that V2 systems should
>     not create documents that put #PCDATA in the content of <Book>
>     elements, even though it's allowed there, and that book titles
>     should always appear in <Title> elements.
> 
> 
> ****************************************************************************
> ***
> 
> But how can we do all this without a meta-DTD of any kind?
> 
> Well, first, a caveat: you can't check an instance for conformance to
> a model unless you have both the instance and the model.  So
> validation of instances by means of a general-purpose parser is not
> possible unless you have a meta-DTD.  
> 
> And a second caveat: you can't create an application with an
> information-interchange feature unless you have a model for the
> information to be interchanged.  So, at some level, there's no such
> thing as an architecture without some sort of model, somewhere.
> 
> Even if there's no meta-DTD available, however, you can still enjoy
> essentially all of the virtues of AFs, assuming you have an engine
> capable of recognizing the architectural forms that pertain to it, and
> capable of performing the processing required by those architectural
> forms.  (Such an engine would probably incorporate at least some of
> the logic necessary to validate the forms that it recognizes, in any
> case.)  The only really noticeable disadvantage of not having the
> meta-DTD handy is that you don't get the markup minimization you can
> get from DTDs and meta-DTDs.  This disadvantage would not affect our
> instance #I1 at all:
> 
> <!-- instance #I1; no change -->
> <?IS10744:arch name="V1">
> <Mydoc>
>     <Book>Gone With the Wind</Book>
> </Mydoc>
> 
> But it would affect instance #I2 to the extent that we'd have to make
> the use of the "Architecture Ignore Data Attribute" explicit in order
> for #I2 to be usefully parsable against Architecture V1:
> 
> <!-- instance #I2 without meta-DTDs -->
> <?IS10744:arch 
>    name="V1" 
>    public-id="-//Andrew Layman//ADD Andrew Layman's V1 Architecture
> Definition Document//EN"
>    ignore-data-att="V1IgnoreData"
> >
> <?IS10744:arch
>    name="V2"
>    public-id="-//Andrew Layman//ADD Andrew Layman's V2 Architecture
> Definition Document//EN"
> >
> <Mydoc>
>     <Book>
>         <Title>Gone With the Wind</Title>
>         <Author V1IgnoreData="ArcIgnD">
>             <Person>
>                 <Firstname>Margaret</Firstname>
>                 <Lastname>Mitchell</Lastname>
>             </Person>
>         </Author>
>     </Book>
> </Mydoc>
> 
> Note: Just for fun, I used the "public-id" pseudo-attribute to give
> the formal public identifiers of the Architecture Definition Documents
> (ADDs) of the V1 and V2 architectures.  These documents are not
> meta-DTDs (although they may include meta-DTDs) and they are not
> directly machine-processable; they are just explanations of the
> architectures, probably written in some natural language (these are
> declared to be in English: "//EN").  The purpose of declaring them is
> merely to disambiguate the architectures we're declaring from any
> others that might be called "V1" or "V2".
> 
> Final note #1: With AFs, even when we mix many kinds of semantics and
> vocabularies into our documents, we can still have the ability to
> verify, simply and directly, that any newly created document that uses
> an architecture will be reliably processable by any application of
> that architecture.  By the same token, anyone creating an application
> of that architecture will not face an indefinitely-long list of
> possible configurations of the information.
> 
> Final, final note: AFs are an elegant general solution to the problem
> of recognizing, processing, and mixing all of the semantic facilities
> of XML into arbitrary XML documents, including both RDF and XLink, to
> name two, with minimal or no cost to the flexibility of other document
> architectures.  They also have the effect of giving people other than
> the W3C ability to create similar, but totally arbitrary
> metastructures of arbitrary complexity, and to use them for reliable
> and robust information interchange.  I remain utterly and passionately
> convinced that it's MUCH better to have one, strong, general way of
> mixing common semantic constructs into structured documents, than to
> have several dissimilar ways of doing so.
> 
> -Steve
> 
> --
> Steven R. Newcomb, President, TechnoTeacher, Inc.
> srn@techno.com  http://www.techno.com  ftp.techno.com
> 
> voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137)
> fax    +1 972 994 0087 (at ISOGEN: +1 214 953 3152)
> 
> 3615 Tanner Lane
> Richardson, Texas 75082-2618 USA
> 
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
> 

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
References:
- RE: Question about Architectures and Versioning
  - From: Andrew Layman <andrewl@microsoft.com>
Prev by Date: Re: Entities (was Re: XSchema Spec, Sections 2.0 and 2.1 (Draft 1))
Next by Date: XSchema Spec, Section 2.3
Previous by thread: Re: LISTRIVIA (was RE: Question about Architectures andVersioning)
Next by thread: Re: Encoding dcl in external DTD subsets
Index(es):
- Date
- Thread