xml-dev - Re: [xml-dev] Quiz: XML flexibility

Re: [xml-dev] Quiz: XML flexibility
[ Lists Home | Date Index | Thread Index ]
To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Quiz: XML flexibility
From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
Date: Sun, 27 Feb 2005 19:54:05 -0500
In-reply-to: <20050228000528.an90dj2sldtwgkos@www.mihaiu.name>
At 2005-02-28 00:05 +0200, Razvan MIHAIU wrote:
>...
>     Questions like this are asked *today* in an XML certification exam 
> like IBM 141.
>...
>     This is why I am worried about this exam: on many occasions I found 
> out that
>the question's text did not provide enough information. In such situations you
>just have to try to guess what the exam writer was thinking and as a 
>result you are guessing the answer too.

That is so true and hasn't changed.

>     I will try to clarify the question.

No need for me, I understood your question and I think there is no 
"correct" answer.

>The answer from this quiz suggests that
>flexibility can only be achieved by using mixed content models. Should I just
>discard this (as Michael Kay suggested) or there is some true behind this
>affirmation ?

But flexibility at what cost?  When you introduce mixed content, how can 
you constrain the input to be desired sequences?  How can you prevent 
unnecessary and incorrect text between the elements you want in sequence?

Mixed content has a role and it is not opening up a model for "flexibility" 
while abandoning the ability to constrain to new business requirements ... 
which defeats the purpose of doing document modeling.

> > Document models described by W3C Schema are extensible only adding new
>constructs to the end of content models of existing constructs ... to me this
>is not very flexible at all ... I might want to put new information in the
>middle of a content model.
>
>    You can achieve this with mixed content models.

I disagree.  You cannot constraint your new documents to your new business 
requirements if you throw everything into mixed content.  You cannot add 
flexibly any new requirements for constrained validation if you utilize a 
modeling semantic (mixed content) that does not constrain.

>With this you can add as many elements as you want where you want.

That's the problem.  I may wish to constrain a new document model to be the 
old document model with a constrained some new elements not found only at 
the end of the old document model.

Mixed content is not a panacea ... it has its role, and using it for 
"flexibility" is not it.

So, of the given four answers, I believe none of them are correct and I 
would challenge arguments that any one is ... have you been told which 
answer the poser of the question believes is correct?

> > Document models described by RELAX-NG are extensible merely by 
> producing the
>union of *any* two other document models ... to me this is *very* flexible.  I
>can accommodate old and new instances of any vocabulary in this simple fashion
>by creating the new union vocabulary.
> >
> > Running a query on a document modeled by W3C Schema can produce an instance
>result that cannot be modeled by W3C Schema mechanically by a machine analysis
>of the schema expression.  Consider a document where the element "p" is
>modelled one way in one context using a given type, and modelled another 
>way in
>another context using a different type, and the query returns an instance 
>of all
>"p" elements ... since sibling "p" elements in W3C Schema cannot have 
>different
>content models, I cannot machine-generate a W3C Schema expression of the model
>of this result, and I'm obliged to do so by hand without the benefit of
>co-occurrence constraints to help.  To me this is not very flexible.
> >
>     I did not read about XQuery yet.

I did not say anything about XQuery ... all I said is "running a query" ... 
there are many languages that will give you the result of querying a subset 
from another document.

>"Fragments", "XQuery" and "XLink" are my
>targets for the next week.

Please don't get distracted from my point ... I'm only talking about 
obtaining the result of asking for the content from an XML document.  I 
didn't mention any standards for having obtained that content ... it is 
irrelevant to my point.

>I just want to say that if your "p" is found in the
>same XML instance then it must be in different namespaces otherwise the
>document would be invalid.

This is not true from an instance perspective.  Validity is measured by the 
meeting of expressed constraints using the semantics employed by a 
validation language.  Namespaces have nothing to do with my point.

>Since they are in different namespaces it seems
>logical that there is a way to differentiate between them using their
>namespaces. I will not comment on this any further because I still have to 
>read about XQuery.

No, you don't.

DTD validation semantics do not allow an element by name to have two 
different constraints on the content model in the same document.

W3C Schema validation semantics allow two elements by name that are not 
siblings to have two different constraints on the content model in the same 
document.

RELAX-NG validation semantics allow two elements by name to have two 
different constraints on the content model anywhere in the same document.

>After that, maybe, your words will have a new meaning to me.

Please don't get distracted by fragments, query, or link ... that is 
irrelevant to my point about flexibility and about using one expression of 
constraints and a mechanical derivation of a new set of constraints from 
old constraints.  By "mechanical", I mean without human intervention and 
without considerable heuristics that would mimic human intervention ... I 
just mean the simple introduction of a choice that could be easily mechanized.

>     In seems that Relax NG is very popular in certain circles. However the
>author of "Professional XML" states that Relax NG is not meant to replace XML
>Schema:
>
>"All of these proposals might be seen as providing lighter-weight alternatives
>to an implementation using XML Schemas. None of them (except perhaps DSD) are
>intended to replace XML Schema since it has many capabilities that are not
>present in these other proposals."
>
>     The author was speaking about DSD (Document Structure Description), RELAX
>(Regular Language for XML), TREX (Tree Regular Expressions for XML) and
>Schematron.

RELAX-NG and Schematron are ISO standards in the DSDL (Document Schema 
Definition Languages) family - ISO/IEC 19757 (parts 2 and 3) that address 
different requirements for XML document validation semantics and the 
expressions of constraints.

Different problems need solving with different expressions of constraints 
and someone who needs to model a document needs to use the expression 
language that satisfies the modeling semantics they need.

W3C Schema is a set of type-based constraint expression semantics.

RELAX-NG is a set of grammar-based constraint expression semantics.

Schematron is a set of assertion-based constraint expression semantics.

Anyone who tells you one constraint expression language is trying to 
replace another is, in my opinion, not expressing fairness nor 
understanding the role that constraint expressions can play in validating 
XML documents.

Each of the above has plusses and minuses.  Everyone should measure what 
their requirements are for expressing constraints and then choosing the 
appropriate language that has the required validation semantics.  Vendors 
who tell you there is only one sanctioned schema language for XML are 
purposely misleading you.  Choose the one that meets your needs.  Choose 
different ones for different documents if you need more than one.  Choose 
different ones for the same document if the document life cycle dictates 
changing needs for validation.

You asked about flexibility and I was commenting on areas where W3C Schema 
is not as flexible as RELAX-NG.  Choosing to use a set of flexible 
validation semantics will make your XML flexible for the future.  Choosing 
to use a set of validation semantics that are not flexible will box you in 
and keep you from expressing what you need in the future.

Throwing things into mixed content doesn't give one any ability to 
constraint the sequence and order of those constructs, so I am unable to 
use this putative "extension mechanism" to meet my business 
objectives.  It, in my opinion, does not make my XML at all flexible ... 
just gives it the opportunity to be messy without being able to constrain 
it.  I think it is a very wrong answer to making something "flexible".

> > So, to me, these are the kinds of questions to ask to deduce the nature of
>"what is flexible XML?" ... not anyone's particular choice of a given
>vocabulary.
>
>     You are basically saying that there are no special design decisions 
> to make
>when you want to design a "flexible and open to future changes" XML document.

Yes.  That is my point.  Anyone who uses XML will face the need to address 
changes, and choosing to use a set of validation semantics that 
accommodates change will best serve their needs.  One cannot anticipate all 
the possible changes that might be needed.

>     A second thought: the vocabulary may not be important but the way you
>declare the relationships between elements could impact the future
>extensibility of your documents.

Indeed.  Absolutely.  That is my point.

>I will think about this.

I'm glad to hear this ... and please don't think that any one expression of 
constraints is the be all and end all of constraint expression.  Choose the 
one that meets your business needs, your technology needs, your training 
needs, and your comprehension ... hopefully one language will meet all 
these for you but if you need more than one, they are all there for the taking.

Note the DSDL title is plural "Document Schema Definition Languages" ... 
this project explicitly assumes that there are many schema expression 
languages for many purposes and that they should all work together to the 
best of their respective abilities.

Now that RELAX-NG and Schematron are both ISO standards, I anticipate more 
industry (read "vendor") acceptance.

Back to my example of a generic query, below is an XML document named 
"query.xml" ... in it are two different kinds of <p>, one in the context of 
<a> and the other in the context of <b>.

I query the document (by whatever means) and I get "queryres.xml" ... in it 
are all of the <p> elements as siblings.

The original "query.xml" validates using the constraints expressed in 
"query.wxs".  The constraints "queryaut.wxs" could be derived mechanically 
by introducing a choice around both kinds of <p> elements, but you can see 
that W3C Schema constraint semantics do not allow this 
automatically-generated schema expression to validate the result 
document.  The hand-authored "queryres.wxs" does validate the result just 
fine, but I had to introduce the choice with some thought, not with a 
simple union choice of the two kinds of <p>.  Human intervention (or a lot 
of heuristics I wouldn't want to have to program) is required because of 
the restrictions of the constraint semantics in W3C Schema.

This is not true in RELAX-NG where I can express below in "queryres.rnc" 
the simple choice between two different kinds of <p> as siblings.

I've run MSV and Jing in the examples below to confirm my results.

Remember I introduced this as an argument regarding flexibility ... 
choosing W3C Schema is not flexible in some ways because it does not allow 
mixing sibling elements of the same name with different content models.  As 
business requirements change, content models change, and it could be very 
easy for an evolving document model to need to accommodate two elements of 
the same name but different content models.  Such "growth" in validation 
requirements is quite flexibly met in the RELAX-NG validation semantics, as 
illustrated below.

I hope you find this helpful.

........................... Ken


R:\samp>type query.xml
<?xml version="1.0" encoding="iso-8859-1"?>
<doc>
   <a>
     <p>
       <c/><d/>
     </p>
   </a>
   <b>
     <p>
       <c/><e/>
     </p>
   </b>
   <a>
     <p>
       <c/><d/>
     </p>
   </a>
</doc>
R:\samp>call msv query.wxs query.xml
No validation errors.

R:\samp>type queryres.xml
<?xml version="1.0" encoding="iso-8859-1"?>
<doc>
   <p>
     <c/><d/>
   </p>
   <p>
     <c/><e/>
   </p>
   <p>
     <c/><d/>
   </p>
</doc>
R:\samp>call msv queryaut.wxs queryres.xml
start parsing a grammar.
validating queryres.xml
Error at line:4, column:13 of file:///R:/samp/queryres.xml
   tag name "d" is not allowed. Possible tag names are: <e>

Error at line:10, column:13 of file:///R:/samp/queryres.xml
   tag name "d" is not allowed. Possible tag names are: <e>

the document is NOT valid.

R:\samp>call msv queryres.wxs queryres.xml
No validation errors.

R:\samp>jing -c queryres.rnc queryres.xml

R:\samp>type query.wxs
<?xml version="1.0" encoding="utf-8"?>
<wxs:schema xmlns:wxs="http://www.w3.org/2001/XMLSchema";>

<wxs:element name="doc">
   <wxs:complexType>
     <wxs:choice maxOccurs="unbounded">
       <wxs:element name="a">
         <wxs:complexType>
           <wxs:sequence>
             <wxs:element name="p" maxOccurs="unbounded">
               <wxs:complexType>
                 <wxs:sequence>
                   <wxs:element name="c">
                     <wxs:complexType/>
                   </wxs:element>
                   <wxs:element name="d">
                     <wxs:complexType/>
                   </wxs:element>
                 </wxs:sequence>
               </wxs:complexType>
             </wxs:element>
           </wxs:sequence>
         </wxs:complexType>
       </wxs:element>
       <wxs:element name="b">
         <wxs:complexType>
           <wxs:sequence>
             <wxs:element name="p" maxOccurs="unbounded">
               <wxs:complexType>
                 <wxs:sequence>
                   <wxs:element name="c">
                     <wxs:complexType/>
                   </wxs:element>
                   <wxs:element name="e">
                     <wxs:complexType/>
                   </wxs:element>
                 </wxs:sequence>
               </wxs:complexType>
             </wxs:element>
           </wxs:sequence>
         </wxs:complexType>
       </wxs:element>
     </wxs:choice>
   </wxs:complexType>
</wxs:element>

</wxs:schema>

R:\samp>type queryaut.wxs
<?xml version="1.0" encoding="utf-8"?>
<wxs:schema xmlns:wxs="http://www.w3.org/2001/XMLSchema";>

<!--the following automated choice between content models doesn't
     work as validation triggers on only the first definition of <p> -->
<wxs:element name="doc">
   <wxs:complexType>
     <wxs:choice maxOccurs="unbounded">
       <wxs:element name="p" maxOccurs="unbounded">
         <wxs:complexType>
           <wxs:sequence>
             <wxs:element name="c">
               <wxs:complexType/>
             </wxs:element>
             <wxs:element name="e">
               <wxs:complexType/>
             </wxs:element>
           </wxs:sequence>
         </wxs:complexType>
       </wxs:element>
       <wxs:element name="p" maxOccurs="unbounded">
         <wxs:complexType>
           <wxs:sequence>
             <wxs:element name="c">
               <wxs:complexType/>
             </wxs:element>
             <wxs:element name="d">
               <wxs:complexType/>
             </wxs:element>
           </wxs:sequence>
         </wxs:complexType>
       </wxs:element>
     </wxs:choice>
   </wxs:complexType>
</wxs:element>

</wxs:schema>
R:\samp>type queryres.wxs
<?xml version="1.0" encoding="utf-8"?>
<wxs:schema xmlns:wxs="http://www.w3.org/2001/XMLSchema";>

<!--the following hand-crafted version works just fine-->
<wxs:element name="doc">
   <wxs:complexType>
     <wxs:sequence>
       <wxs:element name="p" maxOccurs="unbounded">
         <wxs:complexType>
           <wxs:sequence>
             <wxs:element name="c">
               <wxs:complexType/>
             </wxs:element>
             <wxs:choice>
               <wxs:element name="e">
                 <wxs:complexType/>
               </wxs:element>
               <wxs:element name="d">
                 <wxs:complexType/>
               </wxs:element>
             </wxs:choice>
           </wxs:sequence>
         </wxs:complexType>
       </wxs:element>
     </wxs:sequence>
   </wxs:complexType>
</wxs:element>

</wxs:schema>

R:\samp>type queryres.rnc

start = element doc
    {
       (
          element p
          {
             element c { empty },
             element d { empty }
          }
          |
          element p
          {
             element c { empty },
             element e { empty }
          }
       )+
    }

# end of file


--
World-wide on-site corporate, govt. & user group XML/XSL training.
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/x/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Breast Cancer Awareness  http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
References:
- Re: [xml-dev] Quiz: XML flexibility
  - From: Razvan MIHAIU <mihaiu@mihaiu.name>
Prev by Date: RE: [xml-dev] Quiz: XML flexibility
Next by Date: Re: [xml-dev] Quiz: XML flexibility
Previous by thread: RE: [xml-dev] Quiz: XML flexibility
Next by thread: Re: [xml-dev] Quiz: XML flexibility
Index(es):
- Date
- Thread