xml-dev - Re: Instance-centric Semantics (WAS RE: [xml-dev] Being "precise" vs be

Re: Instance-centric Semantics (WAS RE: [xml-dev] Being "precise" vs be

[ Lists Home | Date Index | Thread Index ]

To: "Bullard, Claude L (Len)" <clbullar@ingr.com>
Subject: Re: Instance-centric Semantics (WAS RE: [xml-dev] Being "precise" vs being "human")
From: "Steven R. Newcomb" <srn@coolheads.com>
Date: 25 Jan 2002 02:30:50 -0600
Cc: xml-dev@lists.xml.org
In-reply-to: <2C61CCE8A870D211A523080009B94E4306DF62F2@HQ5>
References: <2C61CCE8A870D211A523080009B94E4306DF62F2@HQ5>
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7

"Bullard, Claude L (Len)" <clbullar@ingr.com> writes:

> If I understand you correctly, Steve, you are 
> turning the markup world on it's head a bit. 
> Instead of the element type declaration dominating 
> the instance (here is the contract, it is enforced 
> on any one of these you encounter), you are saying 
> the instance declares the classes to which it belongs 
> (here is my contract, pick the right declaration) 
> and the instance dominates the contract? 

That's right, Len.  This is about who controls what's
allowed to be said in an XML document.  It's about
allowing the document to say, "I contain data that can
be understood within the X, Y, and Z processing
contexts.  (Substitute for X, Y, and Z the favorite
software vendors of the likely customers for the
information in the document.  The information's vendor
doesn't force all of its customers to buy the same
software, just so they can use the information.)

> Is that it?  I understand that you advocate arch 
> forms, and don't dispute they are a solution to 
> that.  I want to first discuss the requirement 
> itself and ensure I understand that.  

> Then the archform solution such as

> <?xml version="1.0"?>
> <?IS10744 arch
>   name="somearch"
>   public-id="+//IDN me.com//NOTATION Some Architecture//EN"
>   dtd-public-id="+//IDN me.com//DTD Some Architecture//EN"
>   doc-elem-form="somedoc"
>   renamer-att="somedocnames"
>   options="option1 option2"
> ?>
> <mydoc somearch="somedoc"/>

> which will immediately invoke the wrath of the PIhaters, 
> but regardless of that, how does that example illustrate 
> a solution to the requirement (multiple classes for 
> one instance)?

Len, please forgive me for minimizing your example of a
"base architecture declaration" as follows:

> <?IS10744 arch
>   name="somearch"
>   dtd-system-id="http://www.somewhere.com/somedtd.dtd";
> ?>

The above Processing Instruction declares a "base
architecture" -- that is, it establishes a connection
between the document that contains the PI and the DTD
that it refers to.  (It's not so very
different from an XML Namespace declaration.)  In order
to support *multiple* inheritance by a single element,
you'd first have to have two or more PIs, referring to two or
more DTDs.  For example, let's say that we have two
PIs, one with name="somearch" and the other with
name="somearch2".  If an element in the same document
says:

<myelement somearch="foo" somearch2="bar"...

then that particular element will be subject to 

(1) all of the syntactic and semantic constraints
    imposed on <foo> elements by the information
    architecture whose DTD is referenced by the
    name="somearch" DTD, and

(2) all of the syntactic and semantic constraints
    imposed on <bar> elements by the information
    architecture whose DTD is referenced by the
    name="somearch2" DTD, .... *and*, of course,

(3) all of the *additional* syntactic and semantic
    constraints imposed by the primary architecture --
    the actual DTD in effect for this document -- on
    <myelement> elements.

In other words:

*  one element, with one explicit syntax,

*  plus two additional implicit syntaxes,

*  so, in effect, three syntactically- and
   semantically-interrelated interpretations.  The
   choice of interpretation is made when the user of
   the document chooses a processing (application)
   context within which the information will be used.

Why is this capability important?  For all the reasons
that XML Namespaces are said to be important.  But
there are telling differences between the
"architectural forms" paradigm and the "XML Namespaces"
paradigm.  Here are two of them:

(1) The "architectural forms" paradigm offers a single
    standard way, implementable in an
    XML-1.0-conforming XML parser, to validate the
    syntactic conformance of an element instance with
    all of the constraints involved in all of its
    various interpretations (one interpretation per
    element type (in the DTD of each base architecture)
    of which the element claims to be an instance).  As
    with any other DTD, the syntactic constraints that
    a base architecture imposes on an element type
    include constraints on its context(s), as well as
    its content and attributes.
    
    Why is this important?  Remember, a DTD (or XML
    Schema or whatever) represents a kind of contract
    between information providers and software
    providers.  For the sake of all information
    providers and consumers, it's good and necessary to
    have a way to tell whether the element instance --
    as it exists in the document instance -- will be
    understandable by software that is designed to
    process data that conforms to the base
    architecture.  The "architectural forms" paradigm
    requires the XML parser to be able to report not
    only the parsed document as it was written, but
    also the parsed document as it would be reported if
    it had been explicitly marked up according to each
    of its "base architectures".  If it's a validating
    parser, then it can also report whether the
    reported "architectural instance" conforms to its
    DTD.  

    When information interchange fails, the public
    interest demands that there be a way to point the
    finger of blame at either the information provider
    *or* the software provider, or both.  Without such
    a publicly-available functionality, the information
    industry will never be independent of the
    information processing industry, the nature of what
    can be said will be limited to what the major
    software houses deign to allow us to say, and the
    evolution of civilization's nervous system will be
    distorted.  (I'm reminded of the fact that there is
    legislation now before Congress that would allow
    the owners of the communications networks to give
    precedence and speedier delivery to packets in
    which they have a special business interest.
    Should the network owners have the power to censor
    the public bandwidth?  I hope not.  And neither
    should the major software vendors have the power to
    censor the interchange of information based on
    their de facto hegemony over the world's desktops.)

    There is no room here for doubters who say, "Show
    me the code."  Standard, super-quality, open-source
    software for this purpose already exists, and has
    existed for years.  James Clark's "SP" parser
    already has everything necessary to do the job, and
    it has been upgraded by Luis Martinez and Peter
    Newcomb to support inheritable XML architectures,
    using the ISO standard Processing Instruction-based
    syntax for declaring "base architectures".  There's
    a link to this software, compiled for Linux,
    Windows, and Solaris, at
    http://www.hytime.org/htnews.html (first paragraph)

(2) Similarly, it means that software that understands
    a given information architecture can be re-used, as
    an engine, in the context of more specialized
    applications.  The same idea appears in XLink.
    People who wonder why XLink isn't more popular
    should consider how unfriendly the current XML
    syntactic environment is to the use of such
    "engine" software.  There's no way for developers
    to prove that their engines function correctly, and
    there's no way for information providers to have
    assurance that they are using a particular "element
    template" (as the XLink Recommendation calls
    architectural forms) correctly.  Indeed, it's hard
    to see how XML Namespaces could be more perfectly
    contrived to preclude such general accountability.

Why am I still beating this dead horse?

Just because XML people should pay attention to this.
Either:

(1) XML will provide a real platform for collaborative
    work, in which real control over syntax can be
    really distributed, without losing the syntactic
    rigor necessary to guarantee effective information
    interchange, or

(2) significant portions of the potential for XML
    syntax to serve as a basis for collaborative work
    in the sunshine will be lost.

Maybe #2 would be the best outcome, all things
considered.  Personally, however, I think that with
"XML Namespaces" the XML community has been shooting
itself in the foot, from the beginning to the present.

Regardless whether I'm right or wrong, the XML
community would be well advised to consider: Whose
interests are served by the technology of XML
Namespaces, and how?  What vision of the future is
guiding the technical development, here?  And is that
the future we really want?

-- Steve

Steven R. Newcomb, Consultant
srn@coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

1527 Northaven Drive
Allen, Texas 75002-1648 USA

References:
- Instance-centric Semantics (WAS RE: [xml-dev] Being "precise" vs being "human")
  - From: "Bullard, Claude L (Len)" <clbullar@ingr.com>

Prev by Date: Re: [xml-dev] Revised Internet-Draft: Media Feature - xmlns
Next by Date: RE: [xml-dev] Push and Pull?
Previous by thread: Instance-centric Semantics (WAS RE: [xml-dev] Being "precise" vs being "human")
Next by thread: Comparing W3C XML Schema and RELAX NG
Index(es):
- Date
- Thread