Re: [xml-dev] Open XML Markup Compatibility

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: Greg Hunt <greg@firmansyah.com>
To: Fraser Goffin <goffinf@hotmail.com>
Date: Wed, 13 Sep 2006 08:23:38 +1000
Fraser,
There needs to be a way of handling this stuff, but I am not sure that 
this is the right way to do it.  I had a quick look at the MS spec.  
Frankly it looks awfully complex for what you get.  The idea of "Must 
Understand" is really "must be able to process this bit of this 
structure version".  If the consuming application is tightly tied to the 
data, shouldn't this just be handled through a different structure 
version?  If we are dealing with data oriented XML, there will be an 
application that consumes the structure that also needs to be updated to 
handle the different structure and the benefit of the MS approach is 
less.  If we are dealing with document or text oriented XML then this is 
a rather more plausible approach because we can assume that the 
structure is more plastic and that the consuming application is not 
tightly bound to the structure that it processes.

Putting this in-band in this way means that everyone shares the same 
processing requirement.  Everyone must process the same thing in the 
same way. For some industries this may be a reasonable requirement, but 
if the consumers have different interests in the message (having 
different types of processing) this approach devalues the idea of 
understanding or processing. Some consumers will understand or process 
some data by dropping it on the floor.  Statistical processors will 
understand the data by recording its existence or by ignoring it.

The MS markup moves the complexity from the consumer to the producer, 
which is fine for MS, with a (probable) large population of  difficult 
to update clients that can sustain either specialised XML parsers (it 
really should be a kind of extension to a validating parser, or should 
use the preprocessing model that the MS document suggests) or which can 
sustain much more complex application level interpretation of the XML.  
In a scenario where there are multiple consumers of data oriented XML 
with differing technology platforms it is harder to see the model being 
readily (and reliably) implementable.  Where the number of consumers is 
small, simply issuing a new version of a structure looks like a more 
feasible approach.  With a large number of consumers, the limiting issue 
seems to be the availability and complexity of the parsing support. 

The alternateContent stuff just makes me dizzy: an alternation mechanism 
in the schema language and a runtime version on top of  it.  This really 
looks like a textual XML idea rather than a data-oriented thing.  Had 
XML Schema addressed these kinds of data-oriented versioning issues we 
might have had tool support for this kind of thing, but absent that tool 
support I am not convinced that there is a low-complexity/low-risk 
approach to this stuff outside of the Schema extend/restrict and 
substitution features.

I understand the desire to enable this type of control, but I am not 
sure that this is the best way to implement it.  The reference below to 
a consumer having to forward data that is not understood to an upstream 
consumer makes me nervous, it adds a level of uncertainty to the 
interface between the first and second consumers.  In the environment 
that I am working in, we do not have this kind of long-chain 
relationship.  Data will be retained for audit purposes, but the data 
size/storage space is not yet a major issue for us.

Greg

Fraser Goffin wrote:
> A while back I posted a question about use of the Must Ignore Unknown 
> (retain/discard) pattern described (primarily) by David Orchard as an 
> approach to processing XML instances containing allowable content that 
> MAY be ignored by a receiver if that content is 'not understood' (see 
> below for original post).
>
> One aspect of this which troubled me slightly was how the 
> communicating parties agree on what content can/should be ignored and 
> what content can/should be retained. In some vocabularies a protocol 
> agreement is made which can be asserted at run-time (e.g. ebXML CPA) 
> but for many exchanges, that protocol agreement (particularly in a B2B 
> scenario) may just be the subject of 'out of band' 
> discussion/documentation and often-times will not cover this aspect 
> specifically (or at least it may only come up some time after the 
> original agreement was made).
>
> For a while I have had a document called 'Open XML Markup 
> Compatibility' (http://www.microsoft.com/whdc/xps/xmcompatspec.mspx) 
> and today I gave it a read over. It's basically a specification which 
> describes formal annotations that can be used to assert 
> 'mustUnderstand',  'ignoreable' and preserve content' requirements for 
> message exchanges. That all sounds good. But I am wondering whether 
> anyone out there a) agrees that it *is* useful/necessary, b) is using 
> it in anger (or something equivalent).
>
> My original post had no responses but I'm not sure if that was because 
> no-one is really all that bothered about this subject (for us it has 
> some potential in our versioning strategy)  ?
>
> Regards
>
> Fraser.
>
> ==== original post - Must Ignore Unkown (retain/discard) - August 30 
> 2006 ====
>
> Many of you will be familiar with this term which is used to describe
> an approach to processing XML instances containing allowable content
> that MAY be ignored by a receiver if that content is 'not understood'.
> 'Not understood' is typically related to content in particular
> locations (extension points) which is contained in a namespace that is
> [foreign] to that of the 'main' schema[s] and which a receiver MAY
> have no prior knowledge of. This is a common (ish) approach where a
> schema 'owner' wants to allow users of that schema to add arbitrary
> (or possibly constrained) content without causing existing
> implementations to fail during instance validation (at least if they
> are only using standard schema validation capabilities of mainstream
> parsers).
>
> David Orchard has written much on this subject (as have a few others)
> and also describes 2 variants of the must ignore unknown pattern,
> specifically, 'discard' and 'retain'. As I understand it, the former
> means that unknown content can be both ignored and discarded (not
> passed to upstream processing) without generating and error, and the
> latter, that content may be ignored but should *not* be removed. It is
> the 'discard' aspect which, when I was discussing the possible use of
> this approach recently, that came under some challenge. I would be
> interested in this forums view :-)
>
> The, not unsurprising, challenge was/is this :-
>
> in a situation where :-
> - message data is captured by some application
> - the basic content model for the transaction is defined by a standard
> schema to which all participants agree to conform
> - the standard schema allows for extensibility at various points so
> long as these are defined in a foreign namespace.
> - some of the data captured has been specified by only one provide of
> the service and that provider has arranged with the application owner
> to put that data in the appropriate extensibility area in an agreed
> foreign namespace.
> - the message (including all extension data) will be sent to *all*
> potential service providers of which there may be many
> - the service provider who requested the additional data wants to use
> the standards based data model *not* create a completely private
> schema for this transaction.
>
> so ... what should receivers of the message who do *not* understand
> the extension do ?
>
> Are they likely to be obliged (possibly by legal, regulatory, audit,
> .. requirements) to retain ALL data that a customer has agreed to send
> (perhaps for non repudiation, DPA, or other reasons) regardless of
> whether they intend to process it or not. And if so, does that make
> it a practical non starter given that the size and content of 'unknown
> data' requires them to provide an adequate (and equally unknown)
> storage (and retrieval) capability  (at least for those business
> transactions to which these sort of obligations might apply) ???
>
> Opinions welcome
>
> Fraser
>
> ==== end of original post ====
>
>
>
Follow-Ups:
- Re: [xml-dev] Open XML Markup Compatibility
  - From: "Fraser Goffin" <goffinf@googlemail.com>
References:
- Open XML Markup Compatibility
  - From: "Fraser Goffin" <goffinf@hotmail.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]