OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Schema Extensibility

[ Lists Home | Date Index | Thread Index ]

At 2006-03-01 11:47 +0000, Fraser Goffin wrote:
>Personally I have been relating these comments to XML schema 
>structures but I could have easily been talking about the service 
>interface supported by those schema. This has highlighted some 
>different opinions about the value of various approaches to this 
>problem which I hope have resonated with those following the thread.
>...
>I guess the thing that I am still mostly undecided about is to do 
>with whether to allow for schema extensibility (using xs:any 
>together with the 'sentry' approach proposed by David Orchard (and 
>others) or whether this is a recipe for an uncontrollable vocabulary.

I think the latter.

>I think the battle-ground is in part characterised by a schema (or 
>service) that, once published is considered as immutable, hence any 
>changes REQUIRE a NEW VERSION with a NEW NAMESPACE, versus a schema 
>which allows non breaking changes to be introduced by both the 
>schema owner and non schema authors and supports both forward and 
>backwards compatibility.

I feel there are manageable ways to accommodate changing namespaces 
in stylesheet libraries and other downstream processes.  Namespace 
URI strings are, after all, just strings, and both XML (with 
entities) and programming languages have imaginative ways to work with strings.

Namespaces provide disambiguation and global type identity (labels) 
... if the processing of an information item changes, or the 
processing of a collection of information items in a vocabulary 
changes, then using a new namespace unambiguously indicates that 
there is something different than before.

>The first situation is a 'clean' and explicit model where the 
>semantics are guaranteed not to be usurped by a non schema owner but 
>where even relatively minor change requirements can have a large 
>impact to implementations (especially when there are a large number 
>of external users of this vocabulary).

Indeed.

>The schema extensibility approach promotes the idea that 
>organisations may want to represent private relationships using data 
>carried at specified points within the standard schema in such a way 
>that that data is only relevant between those parties (using a 
>foreign namepsace) and all others can safely ignore it (and that the 
>schema author should not necessarily attempt to constrain this type of usage).

A very important issue, and one that needs to be addressed in UBL.

>some TP extensions may be incorporated back into the main body of 
>the standard at a later point in which case anyone pair or parties 
>using that extension can agree a move back to the standard 
>definition, at a time of their choosing.

But in the meantime trading partners can continue to use the 
sacrosanct structures and just embed in them richness that is 
important to them, provided that the standardized structures 
(possibly redundantly) carry the aggregate information.  This allows 
compliant applications unaware of the embedded richness to still "do 
their thing" with the recognized standardized constructs.

>It also allows the schema owner to add non breaking 'compatible' 
>change to a schema. The down sides seem to be, that a TP could 
>introduce changes which subvert the intended semantics, and that, 
>over time, what might have started out as a temporary expedient, 
>turns into an entrenched working implementation that is unlikely to 
>be allocated budget to be re-synchronised with the standard.

I'm not so sure that embedding foreign information items into a given 
structure would necessarily change the semantics of the information 
in that structure.

I've long held that semantics are in the eye of the consuming process 
and that information *means* only what the recipient wants it to 
mean.  Of course reliable discourse happens when the recipient 
interprets it to mean what the sender intended, but the recipient can 
also choose to interpret it any way they want for their own 
purpose.  Therefore, say with UBL, if I have a UBL processing 
application that understands the meaning of the information labelled 
according to the labels published by the UBL TC, no amount of 
embedded foreign information is going to impact my semantic 
interpretation of what the committee intended.

It does put the burden on the sender, though, not to ignore the 
semantics represented by the labels chosen by the TC, so it would 
benefit the sender to respect the vocabulary labels and populate the 
structures with meaningful information to a downstream conformant UBL 
processor.  But a trading partner who understands the foreign 
information will suddenly have the additional information available 
to them because they will have a semantic understanding of the 
information found with the foreign labels.

XML doesn't "do" semantics, I believe it just labels the information 
in the structures with rich, globally-unique, namespace-based labels 
to effect interchange without ambiguously losing the labeling of the 
information.  How the sender and receiver trading partners interpret 
the semantics of the information at those labels is their business, 
and their business will flourish if they have the same 
understanding.  XML won't give them that magic understanding.

>So, in part the question is, should a schema allow for unknown 
>extensions for unknown purposes (but in specified locations) and 
>still be considered as 'compliant', or should schema authors attempt 
>to constrain (eliminate) that behaviour.

Neither, I believe.

>I can't help feeling the attraction of the second model, but my 
>'gut' tells me that something as inflexible will soon become a 
>business constraint and that will signal it's demise.

Extensions are, I believe, out of scope of the original vocabulary, 
and therefore, "none of the business" of the original vocabulary and 
"not even a worry" to the original vocabulary (or its creators!).

It happens that last night I expounded on this very point to the UBL 
committee in order to present how I believe trading partner 
extensions to UBL can be easily accommodated *by doing nothing* 
within the UBL structures:

   http://lists.oasis-open.org/archives/ubl/200602/msg00117.html

In that posting I present the scenario that the UBL TC has 
standardized what an Order is, but that two trading partners in the 
aerospace industry need to augment the Order with richness important 
to them, yet they don't want to violate UBL or be considered 
non-compliant.  I posit that the aerospace industry *can do anything 
they want* to augment a UBL Order and they will *still be UBL 
compliant* if they use the UBL Order information compliant with the 
semantics attached to those labels by the UBL Technical Committee 
(and published in five languages so far) in the instance they exchange.

I demonstrate how NVDL can be used for just this purpose, *without 
making a single change to the read-only UBL document models as 
published* and I come to the conclusion:

At 2006-02-28 21:57 -0500, G. Ken Holman wrote:
>I see the basic premise as:
>
>  - the UBL information in an order instance has to conform to the 
> sacrosanct, read-only document models created by the UBL Technical Committee;
>  - at the least, a user of augmented orders must fill in the UBL 
> fields so that recipients who do not recognize the augmentations 
> can ignore them because the fields they do recognize they know what to do with;
>  - users who choose to recognize the embedded augmentations can do 
> what they wish with them, just the act of having them doesn't 
> "disturb" the UBL information in the instance.
>
>This is a different way than the traditional way of looking at 
>document validation where you have to have the one model of 
>everything in the instance, but it really isn't foreign.  Consider 
>that you have an XHTML document ... if you choose to embed an SVG 
>image in the middle of the document, you still really do have an 
>XHTML document just with something inside.  Why burden the XHTML 
>document model with knowledge of SVG details in order to do 
>validation?  With namespace-based validation dispatching, the 
>detection of SVG in an XHTML document can trigger the validation of 
>the SVG component with the SVG model, while making the SVG invisible 
>to the validation of the wrapping XHTML from the XHTML model.
>
>So as trading partners we don't "validate an instance as valid UBL"; 
>instead we "validate the UBL information in an instance as valid 
>UBL", as well as checking whatever other information we might also 
>have in our instance that is important to our exchange.

That last paragraph is the important change in perspective that I'm 
trying to bring to light to the committee.  There are some who still 
hold with a traditional view that the entire instance *has a model*, 
rather than the different view that sets of labeled information found 
in an instance *each have their own model* (and when there is only 
one model then the model is for the entire instance, but that is just 
an edge case; granted one that we've been using all along for 
markup).  And those sets are identified unambiguously through the use 
of namespace-rich labels.

Accommodating "the entire XML instance has a model" is, I believe, 
more difficult, time consuming and frustrating than accommodating 
"each set of information found in an XML instance has its own model".

At 2006-03-01 11:47 +0000, Fraser Goffin wrote:
>With my SOA hat on I would recognise the importance of 
>interoperability and the significant role that standardised 
>vocabularies have to play.

Great!  Standardized vocabularies give us the labels with which we 
can identify the information unambiguously so that we hopefully apply 
the understood published semantics against the so-labeled information.

>I also don't especially want to promote the myriad of point-to-point 
>relationships that 'going private' implies and instead want to 
>leverage the 'reach' of a market standard.

All power to you!  And I believe it can be done *arbitrarily* between 
trading partners without impacting the integrity of the standardized 
vocabulary.

>Personally I still have no definative conclusion that I feel 
>comfortable in turning into a recommended approach within my own 
>organisation and within the industry standards body that I work with 
>from time to time, so I thought I'd give it one more go.

I've come to the conclusion that the technology standards are in 
place and the tools are coming with which these problems are 
addressed and committees like the UBL TC can go on its way doing its 
own thing and standardizing a set of labels and understood semantics 
as a platform on which anyone wishing to augment those labels with 
their own representing their own semantic concepts can do so without 
worry and without upsetting the standards.

Note that this is not an official position held by the UBL TC, as it 
was only yesterday that I expounded on my ideas to the committee.  I 
cannot represent the above as an official UBL description of its 
extensibility, only as my input to the UBL discussion of 
extensibility.  When the TC comes to a decision of how to accommodate 
extensibility in UBL structures, this will be documented in detail to 
help UBL users.  I do gather there is some resonance in the TC 
regarding my input, but I have also heard some reluctance of "but 
that isn't how we've been thinking of doing it and we need to do it 
this other way (e.g. W3C Schema ANY) like other projects".

>Some of the issues and comments highlighted by the earlier thread 
>are provided below. Some are direct quotes from contributors, others 
>are excepts from various ramblings :-)

I feel that it is out of scope to have to think of how to accommodate 
the arbitrary (and imaginative!) ways people might want to augment my 
structures.  If it were left to me to make my structures extensible, 
whatever way I chose would make someone unhappy because they wouldn't 
be able to extend it the way they want.  By punting on the whole 
issue, I can focus on my structures, I can prepare my processing 
applications to accommodate (and ignore) the presence of foreign 
content, and go about my business while others can augment what I do 
to meet their purposes provided they don't violate my processing 
systems by ambiguously using the labels I've published as my vocabulary.

BTW, XSL-FO has done this since 2001 when it was published ... XSL-FO 
processors accommodate the presence of foreign namespace labels in 
the information structures and quietly ignores their content ... and 
because of this I have done some very imaginative and fun things in 
UBL annotating XSL-FO instances to synthesize XSLT stylesheets that 
then get processed for production use (freely available from our web 
site for those interested).  The XSL-FO designers had no idea what I 
wanted to do to augment their work, but I was able to do it without 
needing any extensibility structures built into their vocabulary.  I 
demonstrated this use of namespaces years ago and I have long thought 
that this benign accommodation of foreign content using namespaces is 
the ultimate in extensibility.

I hope this helps.

. . . . . . . . . . . . . . . Ken

--
Upcoming XSLT/XSL-FO hands-on courses: Washington,DC 2006-03-13/17
World-wide on-site corporate, govt. & user group XML/XSL training.
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/x/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Cancer Awareness Aug'05  http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS