OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: using namespaces to version

At 2001-05-04 10:05, Tony Coates wrote:
 > On 04/05/2001 15:43:01 C. M. Sperberg-McQueen wrote:

 >> In other words, the relation of namespace to schema is
 >> many-to-many, not one-to-one.  This turns out to be a hard pill for
 >> some to swallow, but I think it is time to accept the logical
 >> consequences of our designs.  (The people I know who want a
 >> one-to-one relation are, as far as I can tell, still fighting the
 >> battles involved in the development of the namespaces rec.  Let it
 >> go, friends! Let it go!)

 > Yes, the intersection between schemas, namespaces, and versioning is
 > certainly proving to be cathartic for me too.  However, in this
 > many-to-many world, how does my application determine which versions
 > of all the many imported and included schemas apply to a particular
 > instance document?

It's definitely handy to be able to know exactly what declarations
were used; that's one reason XML Schema allows a processor to expose
the Schema components to downstream applications as part of the
post-schema-validation infoset.  It's not required (because some
small-footprint processors may not need this functionality), but
it may be something that affects your choice of processor (much the
way DTD-validation or lack thereof, or DTD-awareness or lack
thereof may affect your choice of XML 1.0 processors, or support
for RANK and DATATAG might affect your choice of SGML processors --
positively or negatively).

So, concretely, you consult (in the output of your schema processor)
the information in the [schema information] property of the element
(or the element information item, if you prefer) at which validation
began.  In particular, you want the [schema components] properties
for the namespaces you care about, or -- if all of your declarations
come in via schema documents -- the [schema documents] property.

 > Be aware that "your application needs to be flexible and robust" is
 > not a suitable answer, because in the financial world contextual
 > mistakes can be worth millions of dollars.  I need to know exactly
 > which version of each element was used, and I really need this
 > information to be available with the DOM tree that my parser
 > returns.

Agreed. One reason that the mechanism for binding namespaces to
schemas is left so wide open in XML Schema 1.0 is that applications
should not HAVE TO be "flexible and robust" in accepting data.

In some cases (including, I suspect, the financial world), it is even
more handy not just to know afterwards what declarations were used,
but to be able to control which declarations are used.  I see from
your later postings to xml-dev that this is not what you are in fact
after, though I agree with Len Bullard that this is going to be
important for lots of applications.

I think (speaking for myself) that there is room for experimentation
here, and certainly a need for experience with different ways of going
about it.  I can imagine several rules, which might be combined in
different orders.  (N.B. the order in which I list these rules is
random, not intended as a proposed ordering.)  In fact, there are
several distinct *kinds* of rules:


   A hard-code the schemas for the namespaces you care about into your
     software (hard-coding a schema into your software -- what a lovely
     way of showing you care ... sounds dumb, but it's the way HTML
     browsers have worked for a long time)

   B hard-code a list of schema locations for the namespaces you care
     about; your software consults those locations for the version
     you consider right

   C assume that every element is declared with the urtype, and every
     attribute with the ur-simple-type (useful mostly as a fallback,
     I guess)

Overt user control:

   D pass a set or sequence of (namespace-name, schema document) pairs
     to the processor at invocation time, e.g. as a command-line option

   E when you need a schema, ask the user (interactively?) where to
     get it (some XML editors do this with DTDs, if they can't find it
     any other way -- this plan is probably most useful as the last
     attempt in a series, when all else fails)

Trust the document:

   F dereference the namespace name and take what the server gives you
     (or: then use RDDL to find what you want)

   G dereference the URIs given as schemaLocation hints and take what
     the server gives you

Indirection through catalogs or paths:

   H use a series of regular expressions into which you substitute
     the namespace name, or parts of the namespace name, or the
     URI given as a schemaLocation hint, or parts of it, and treat each
     one as a system identifier (the way the sgml-public-map variable
     works in psgml, or the way sgmls entity resolution used to work
     before catalog support)

   I use the namespace name (viewed as a PUBLIC identifier) to consult an
     SGML-Open catalog file

   J use the namespace name (viewed as a SYSTEM identifier) to consult an
     SGML-Open catalog file

   K use the schemaLocation hint (viewed as a SYSTEM identifier) to
     consult an SGML-Open catalog file

   L use the namespace name or schemaLocation hint to consult an
     SGML-Open catalog file looking for a NAMESPACE or SCHEMA keyword
     (this is an extension, for now, but Oasis might be persuaded to
     build some keyword into the new version of catalogs)

   M use the system identifier you got from the SGML-Open catalog to
     consult the SGML-Open catalog file again (keep going until you
     have a system identifier for which the catalog owner has not
     provided a redirection)

   N accept an invocation-time parameter which specifies where to
     find catalog files

Not all of these are equally useful (I have always hated systems that
made me work with option H), and if some catalog support (N and one or
more of I-L, and optionally M) is provided, it's not clear that
run-time options (D, E) will be needed for most users.  For some
applications, hard-coding things may be the way to go, for the
namespaces one cares most about: it depends on the deployment

 > The situation you describe sounds fine if you don't expect to have
 > to version your elements, because their meaning isn't expected to
 > change with time.  This will be true in some areas, but certainly
 > completely incorrect for others.  You cannot always invent a new
 > name just because the semantics change in some way.

Versioning seems to be a hard challenge -- I get the impression it's
hard in part because we want contradictory things from a versioning
mechanism: sometimes the test of a successful versioning mechanism
appears to be that it allows us to label both version 1 and version 2
of language X as just 'X', so we don't need to change all the labels
in our data when we move to a new processor (or vice versa), and at
other times it appears that the test of a successful versioning
mechanism is that it allows us to tell specifically which version(s)
of a specification data, or software, is compatible with, so that we
can fail quickly and avoid catastrophic errors (the processor for
Boolean-Language 1.0 failed on the Boolean-2.0 data, because it
ignored the new alternate notation for 'not' -- or rather, it didn't
fail, it only produced catastrophically erroneous results, smiling
cheerfully all the while).

But I think both forms of versioning can be supported if we give the
user sufficient control over which schemas get bound to which

Case 1.  Language X has two versions, labeled 'X' and 'X'
respectively.  I know my data is in version 2, so I tell the processor
to use the schema for version 2 -- that is, I tell the processor
to bind namespace X to schema X2.

Case 2.  Language X has two versions, labeled 'X' and 'X'
respectively.  I don't know which version is used by the data you just
sent me, but I need to know, so I tell the outermost layer on my
system to try first using the schema for version 2, and if that
doesn't work then try the schema for version 1, and let me know
which worked.

Case 3.  As for Case 2, but I don't actually care which version
version of language X is used by the data you just sent me.  Either I
do the same as in Case 2 (but ignore the information about which
version it was), or I tell the processor to use the 'any-X' version of
the schema, which accepts all documents in either version.  (If the
schema language I am using has the 'determinism' rule we inherited
from ISO 8879, the union schema may also find it necessary to accept a
few documents which aren't legal in either version -- one reason some
people would like the community to move away from the determinism

Case 4.  Language X has two versions, with namespace names X1 and X2.
I know my data is in X2, and I tell the processor to bind the schema
for X2 to the namespace name X2.

Case 5.  Language X has two versions, X1 and X2, and I don't want to
maintain two schemas for it, so I write a union schema and bind both
namespace names to it.  (Or I say "I don't care what other people do,
I'm binding both X1 and X2 to the schema for X2", much the same way
that on my system the catalog entry for HTML 4.0 Transitional points
to the DTD for HTML 4.0 Strict -- or did for a while.)

 > Nor do I really see us wanting to have a separate namespace for each
 > individual element (and wouldn't that be a great bandwidth blow-out
 > for our instance documents).  There seems to be a piece of the
 > puzzle that is missing, certainly for enterprise usage.

Versioning is certainly still an ongoing challenge.  How to label
things so that we can, as far as possible, allow old data to work with
new processors, allow old processors to work with new data when they
can do so safely, and allow old processors to detect, reliably, when
they need to fail on new data rather than risk processing it -- if
anyone has solved that problem, a lot of us would like to know how.

-C. M. Sperberg-McQueen
  speaking for himself