xml-dev - Re: [xml-dev] Rationale Behind The PSVI

Re: [xml-dev] Rationale Behind The PSVI

[ Lists Home | Date Index | Thread Index ]

To: xml-dev <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] Rationale Behind The PSVI
From: Jeni Tennison <jeni@jenitennison.com>
Date: Thu, 16 May 2002 14:43:04 +0100
Cc: "Evan Lenz" <evan@evanlenz.net>
In-reply-to: <3CE393FA.98731AC3@rpbourret.com>
Organization: Jeni Tennison Consulting Ltd
References: <8BD7226E07DDFF49AF5EF4030ACE0B7E06621C0C@red-msg-06.redmond.corp.microsoft.com><3CE393FA.98731AC3@rpbourret.com>
Reply-to: Jeni Tennison <jeni@jenitennison.com>

Ron Bourret wrote:
> Agreed. The worst of it is that the PSVI information is just the
> information that appeared useful in the eyes of the authors of the
> schema spec. Different applications will care about different parts
> of it and I'm sure that reasonably clever people can come up with
> information that isn't included at all.

The types of the individual items in a list is one that springs to
mind, and something that makes things annoyingly difficult for
XQuery/XPath 2.0 data model, which is only interested in atomic types.

Here's an illustration that demonstrates the problem. Say you have:

<xs:simpleType name="commandType">
  <xs:restriction base="xs:token">
    <xs:enumeration value="moveto" />
    <xs:enumeration value="lineto" />
  </xs:restriction>
</xs:simpleType>

<xs:simpleType name="commandOrCoordType">
  <xs:union memberTypes="commandType xs:integer" />
</xs:simpleType>

Imagine a commandOrCoord attribute with type commandOrCoordType:

  commandOrCoord="moveto"

In the PSVI, the commandOrCoord attribute information item has a
schema normalized value of "moveto", a type definition of
commandOrCoordType and a member type definition of commandType.

In the XQuery/XPath 2.0 data model, the attribute node has a type of
commandOrCoordType and a typed value with the value 'moveto' and the
type commandType.

Now consider:

<xs:simpleType name="pathType">
  <xs:list itemType="commandOrCoordType" />
</xs:simpleType>

and a path attribute with the type pathType:

  path="moveto 100 300
        lineto 200 400 300 300"

The PSVI contains a path attribute information item with a schema
normalized value of "moveto 100 300 lineto 200 400 300 300", a type
definition of pathType and no member type definition.

To satisfy its requirement that atomic values are always of atomic
types, the XQuery/XPath 2.0 data model needs to translate this into an
attribute node whose typed value is the sequence ('moveto', 100, 300,
'lineto', 200, 400, 300, 300), with the 1st and 4th items being atomic
values of the type commandType and the rest of the items being atomic
values of the type xs:integer.

But XQuery/XPath 2.0 can't get that information directly from the
PSVI. It can split the schema normalized value on the whitespace. It
knows that each of those are of the type commandOrCoordType, because
that's the item type of pathType. But in order to get to an *atomic
type*, it needs to validate each of the values individually against
the commandOrCoordType to work out which of the member types it is.

> One wonders if a general mechanism for augmenting the infoset with
> metadata wouldn't be a better solution, although I can imagine how
> cumbersome that would be for things like a content model.

I beginning to think that we need to question the notion of passing
around augmented infosets at all. Constructing an augmented infoset
due to the particular requirements of a process is fine; the PSVI is
the particular augmented infoset that you get when you construct an
augmented infoset with an XML Schema -- great, as I've indicated there
are plenty of things there that I want to use.

But we need to separate that from the notion of the XML that we pass
around being, by the very existence of a particular DTD or schema,
augmented. XML gave us a way out from that by de-coupling the document
from the DTD. That de-coupling meant that we could view the same set
of information as different augmented infosets depending on what we
needed to do with it -- an element isn't, in its essence, valid or
invalid, it's valid or invalid *according to a particular DTD*; an
attribute isn't, in its essence, an ID attribute, it's an ID attribute
*according to a particular DTD*.

That's a very hard step to make because people understandably feel
they need to have a consensual view on a particular document so that
they can ensure that other people are interpreting the information
in the document in the same way as they are. But augmenting infosets
doesn't give you that guarantee -- just because you're both viewing a
particular set of characters as a number doesn't mean that you're not
interpreting it as a height while they interpret it as a distance.

As far as I can tell, the only tools that actually benefit from a
document telling them what its type is are tools that treat XML in a
generic way -- editors or viewers that localise dates when they
display them, or pop-up calendars when you try to edit them. As with
stylesheets, associating a default schema with a document is useful
because it allows them to present a default view of the document.

In other applications, you know what you're expecting, you know the
augmentations you need. In fact, it's *dangerous* to accept augmented
infosets that haven't been augmented by you -- who knows what type
someone else might have associated with an attribute? Just because an
element was valid under the schema they used to create the PSVI
doesn't mean it's valid under yours.

I think I'm beginning to see the point Evan was trying to make. If
we're creating safe stylesheets, we need to be able to *guarantee*
that the node trees that we're manipulating are not augmented with
anything but our own augmentations. Take a stylesheet that uses the
id() function to access foo elements, on the assumption that the id
attributes of foo elements are the ones that will be annotated as ID
attributes. Then pass the stylesheet an XML Infoset in which the id
attributes on bar elements are annotated as ID attributes instead. If
we accept that Infoset as the source blindly, the behaviour of the
stylesheet is radically different from what we intended. These
problems are multiplied a thousandfold with the PSVI. It's not enough
to control what schema is used on a source *document*, we need to be
able to control what schema was used to create the nodes that form the
input to the transformation.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/

References:
- Rationale Behind The PSVI
  - From: "Dare Obasanjo" <dareo@microsoft.com>
- Re: [xml-dev] Rationale Behind The PSVI
  - From: Ronald Bourret <rpbourret@rpbourret.com>

Prev by Date: RE: [xml-dev] Patent non-proliferation and disarmament
Next by Date: Re: [xml-dev] Patent non-proliferation and disarmament
Previous by thread: Re: [xml-dev] Rationale Behind The PSVI
Next by thread: Why would I need a PSVI?
Index(es):
- Date
- Thread