OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Finally, what if namespaces == document types ?

[ Lists Home | Date Index | Thread Index ]

> It has nothing to do with schemas or schema languages. There is no data
> model for namespaces other than "set of names". XSLT uses them as a way
> of having literal elements. WSDL uses them as a way of defining
> WSDL-specific extensions. In theory, XHTML could use them as a way of
> adding new renderable elements. In each circumstance you would combine
> the schemas in a different way.
>
> This wide variety of usage patterns came about because we were told that
> "namespaces don't mean anything, they just name things." But then XML
> Schema came along, and it did treat them as if they were somewhat
> meaningful, because each schema had a targetNamespace. That's okay,
> because you can combine schemas so it is primarily just a file layout
> issue. Now RDDL comes along and tries to make it easy to find "the"
> schema for a namespace. Or "the" XSLT for a namespace. But there is no
> "right" schema for a namespace. The right one depends on the document
> type (i.e. what the document means, all together) and what the recipient
> is trying to do with the document.
>
>
> So of RDDL's three goals, #3 doesn't seem practically achievable until
> and unless we define a data model for namespaces and define the
> semantics of namespace combination. If we do not do that, then HTML is
> probably a sufficient referent.

I totally agree with you. But what is interesting me here is your point
about the fact that XML Schema treats namespaces as if they were meaningful.
That's what I answered to Simon St.Laurent in a previous mail (I can't find
it in the archives, it seems that the archives have lost a few posts...).
Simon was against defining document types, and wanted to use namespaces
differently so that they could be assimilated to document types. We now are
at a point where we have the choice between two scenarii :

1) Namespaces are just sets of names with no additional semantics, they are
50% of QNames and that's all.

In this scenario, schemas can span multiple namespaces because the semantics
of a document are defined by its schema, not the namespaces it uses.
WARNING: like for all of by previous posts, 'schema' means a set of
constraints on the structure of a document. A schema can be written as a
DTD, an XML Schema, a RELAX NG schema, etc. In this world, we have to define
the concept of 'document type' so that we may associate meta-data (schemas,
stylesheets, code, etc.) to a set of documents that share the same semantics
(the same abstract schema). In this world, associating schemas to namespaces
is nonsense, RDDL is still useful as a namespace description format and XML
Schema is severely impaired (because you can't write the XML Schema of a
document which mixes namespaces, like RDDL, WAP 2.0, a SOAP request, etc.).

2) Namespaces are meaningful. They are not only 50% of the document.

Some programs can implement algorithms based on the namespaces only, and
still produce interesting results. For example, a browser, when encoutering
an unknown namespace, could delegate the rendering of the subtree beginning
at the foreign element to a plugin that could be dynamically downloaded.

This is possible because an abstract schema is bound to the namespace, so
that it is possible to write code that depend only on the namespace and its
intrinsinc schema. A renderer plugin could process the subtree because it
would know the elements and structure of elements that the namespace schema
enforces. The enclosing schema would not have any means of changing this
schema, so the plugin would never find unexpected data.

[ This would not be possible in scenario 1, where the document structure is
totally unrelated to the namespaces, so it would be impossible to associate
some code to a namespace and expect any namespace-inherent structure. There
could be situations in which some elements would be recognized, but found in
a totally different context than the plugin could expect, thus causing
failures or worse, incorrect results. ]

Note that this model could be easily extended to support multiple schema,
depending on the first element encountered. In that case, a namespace would
have one abstract schema per element that can stand-alone or be embedded
into a foreign document.

In this scenario, XML Schema would be at ease, not limited by its unique
targetNamespace. Indeed, schemas would never span multiple namespaces, to
ensure the proper operation of all the code that depends on namespaces.
However, schemas could be extensible and integrate multiple namespaces by
delegating the validation of foreign namespaces to their appropriate schema
(based on the element encountered). We would have to find a way to
standardize this delegation mechanism. Moreover, as we cannot expect all
schemas for all namespaces to be written in XML Schema, this delegation
mechanism should support cross-schema-language delegation. This would result
in the "namespace-insulation" of schemas, the schemas being assigned a
single targetNamespace, yet have interfaces with other namespaces through
delegation mechanisms.

There is another example of usage of namespaces, maybe more interesting than
plugins for rendering given the current news about web services. A
namespace-based content validation, dispatching and processing would
radically change the way web services could be implemented. We could have
general validation for any SOAP request, for example, each part of the SOAP
message being validated by an appropriate message : the SOAP envelope by the
SOAP schema, the request by the schema associated to the namespace and name
of its root element, etc.

Of course, in this scenario, typing information would be closely related to
namespaces, so RDDL would be the perfect place to list all the schemas of a
namespace, not forgetting to specify the root elements of each schema.

Note that this delegation mechanism and the "namespace-insulation" property
of scenario 2 has an equivalent in scenario 1. This would be implemented by
having the schemas reference other document types defined in other schemas.
Likewise, for better integration, we would have to design a delegation
mechanism that would allow schemas in one language to ask for the validation
of foreign document types with schemas in other languages.

Now let's try to compare the two scenarii.

- Scenario 1 and 2 seem equally powerful.

- They both require some work on the schema validation process, to design
this cross-schema-language validation mechanism.

- Scenario 2 has the advantage of not requiring to throw away the work done
on XML Schema. Though XML Schema is not the only schema language available,
far from it, a lot of technologies have standardized on top of it. Would it
be wise to throw it away, like scenario 1 seems to require ?

- Scenario 2 is also the simplest way to go to handle the nasty habit of
some specs to add supposedly "out-of-band" special attributes to documents.
XML Schema, for example, adds an xsi:schemaLocation attribute to the root
element of each document. In scenario 1, we could either ignore such an
additional attribute, leaving a hole open in the schema allowing any kind of
foreign attribute to be added to the document without breaking its validity,
or we could specify this attribute in the abstract schema of the
corresponding document type, which would not be satisying either (it would
not scale to other specifications like XLink). In scenario 2, more than root
elements, we could define root patterns as keys for schemas. Instead of
saying 'for each foo:bar, the schema must be so and so", we could say "for
each element that has an xlink:href attribute, the schema must be so and
so".  This way, we could validate 'squatters' attributes without touching
the original document schema.

- On the other hand, scenario 2 will break some current schemas that span
multiple namespaces. DTDs like the DTD for RDDL, WAP 2.0 et al. would not be
accepted as proper schemas since they don't respect the namespace insulation
principle. With luck, we could rewrite those schemas to respect the
namespace insulation principle. If we cannot, then we're stuck : the schema
is broken and can't be properly used in the new XML world that was created
when choosing scenario 2. But what was broke can be rebuilt differently...

- Finally, scenario 1 require some more work on the concept of document
type, whereas scenario 2 reuses and leverage the concept of namespace.
Defining the concept of document type is not necessarely difficult, in fact
I suspect it is a set of schemas whereas namespaces are a set of names, so
we could have document types URI, and so on. But this would be a new object
to define, on top of namespaces.

So, which scenario should we choose ? It is quite a surprise to me that
after battling so fiercely against the 'namespace == document type' belief I
see so much advantage to it... Let me say things clearly and try to sum them
up. In the current state of XML specifications and standards :

1) namespace != document type, except maybe for XML Schema which has a
different belief
2) RDDL cannot be used to obtain schemas for a given XML document, so we
have to create a document type
3) An alternative to the document type creation is to play a what-if game
about 'namespace==document type'. Scenario 1 is 'namespace!=document type,
so let's create document types'. Scenario 2 is 'namespace==document type, so
what is XML becoming ?'.
3) Scenario 1 and 2 are equally powerful, i.e. what can be done in scenario
1 can be done in scenario 2. This is what I feel, I don't have any proof.
Ideas and protests are welcome here.
4) However, the price to pay for scenario 1 and scenario 2 seems different.
Scenario 1 will save some schema that would not be rewritable in the
namespace insulation constraints of scenario 2, but adds a new concept and
prohibit the generalised use of XML Schema. Scenario 2 reuses the concept of
namespaces, allows the use of XML Schema and could handle "parasit
attributes" quite nicely, but will force a massive rewrite of all schemas
that contains a mix of namespaces, as well as force everybody to think twice
about their usage of namespaces (which would no longer be only 50% of a
QName).

Ideas and remarks are welcome. Do you at least agree that we'll need to
choose between those two scenarios and standardize on this choice, instead
of having some key technologies assume scenario 2 (XML Schema, RDDL) and
others assume scenario 1 (XML Namespaces, XHTML Modularisation, etc.) ?

Best regards,
Nicolas Lehuen
http://nicolas.lehuen.com/






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS