OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] How to spell "No PSVI" in XSLT 2.0 without getting sucked

[ Lists Home | Date Index | Thread Index ]

Hi Evan,

> As much as I resonate with the intent of your approach, I think it's
> the wrong one. While there are examples of abuse (e.g.
> M$tripping-whitespace), XSLT 1.0's deliberate de-coupling of data
> model and serialization was a good and successful decision.
> xsl:output is an optional feature for very good reasons. This design
> has made for a remarkable marriage of flexibility and
> interoperability.

Absolutely. I was imagining that xsl:input would similarly represent a
decoupling of parsing issues and transforming issues, and support for
xsl:input be an optional feature. This would provide interoperability
between the high percentage of XSLT processors that provide an
interface by which you can supply simply the URI of a source document
and have them manage the parsing for you.

It strikes me that Elliotte's argument that XInclude elements should
not be processed by an XSLT processor or not is *precisely* the kind
of issue that would be resolved if there was a recommended method of
controlling the document-to-node-tree process within the stylesheet.

> As far as I see it, there are two things that were *crucial* in
> ensuring the success of this approach:
>   1. There was always an unambiguous, lossless,
>      one-to-one mapping between the data model
>      and its serialized form, and
>   2. with few exceptions, all information in the
>      data model was present in the corresponding
>      serialized *instance* document.
> The PSVI is the antithesis to both of the above. And this is why
> many of us are worried about how interoperable our stylesheets will
> be in a PSVI-oriented world.

First, I don't think the mapping from the data model to its serialized
form is actually one to one. For any given set of attribute values on
xsl:output, there are always options: should the processor use UTF-8
or UTF-16? Should it escape characters with decimal or hexadecimal
character references? Should it use single quotes or double quotes
around attribute values?

Second, you seem to be saying that there's no one-to-one mapping
between a document+schema to a node tree. If that is the case then, as
with xsl:output, those flexibilities that are judged important should
be parameterised. That's already being done through the
flags. Uche and I were suggesting a process-xinclude flag. There might
be others.

Could you expand, though, on your second point. I'm not sure how the
fact that the node tree doesn't contain the entire PSVI impacts on the
ability of information within the stylesheet to control the creation
of the node tree.

> However, I don't think we can dictate a processing chain from the
> stylesheet any more than we could before. Even if we tried, this
> won't buy us interoperability, given the many processing frameworks
> that aren't controlled by "XSLT applications" but nevertheless use
> XSLT processors. If anything, such an attempt will complicate
> interoperability problems in the same way that
> xsl:disable-output-escaping does today.

I disagree. The reason that xsl:disable-output-escaping causes such
problems is that it short-circuits the clean distinction between the
process of building the node tree and the process of serializing that
node tree. I agree that such short-circuiting is really awful, but I'm
not suggesting that there be a special feature whereby users can get
hold of e.g. the declaration of an element despite it not being in the
node tree. What I'm suggesting is that the node-tree-creation process
could, optionally, be controlled from within the stylesheet in a
similarly clean and de-coupled way to the (majority of the)
node-tree-serialisation process.

Just as with xsl:output, if a processor is used in a situation where
it's passed a node tree directly, then it can ignore xsl:input. But
something like xsl:input would provide consistency between the vast
majority of XSLT processors that can be run from the command line
having been given the URL of a source document to transform. We
haven't needed that up til now because, with the exception of XInclude
processing, the set of information that you get out of an XML document
is pretty much fixed. Now that there's a major parameter that affects
the content of the node tree (namely which schema you use to validate
the document), I think we do need it.

I get the argument that XSLT stylesheets should only be concerned with
the transformation part of the process. But on the other hand, given
that a major mode of stylesheet processing is a document-to-document
(rather than tree-to-tree) transformation, and that XSLT says
something about the tree-to-document part of the process, I think it's
right that XSLT should say something about the document-to-tree part
of the process.

So I think I've missed the basis of your argument against using an
xsl:input kind of optional control over the parsing process. Could you
try explaining again?

> Rather, what is needed is a way to dictate what *kinds* of
> information can be present in a source/result tree, based on some
> flag in the stylesheet. In particular, the stylesheet writer should
> have a way to switch between plain vanilla XML/Infoset, and PSVI
> with PSVI-specific information items. In short, this is a data model
> issue more than a processing model issue.
> Such a flag would enable a stylesheet to process an XML document as
> vanilla XML, regardless of its processing history. Its processing
> history may or may not include XML Schema validation. In the event
> that it does, the visible PSVI augmentations will be constrained to
> the kinds of information that can occur in the restricted, vanilla
> data model, namely defaulted attributes, etc. This implies a
> straightforward algorithm for interpreting a PSVI as an augmented
> Infoset without the PSVI-specific information items. Such an
> algorithm would be akin to taking the PSVI, serializing the
> instance, and parsing it again without respect to a schema.

When I first read this, I thought you were talking about changing the
underlying data model based on a flag in the stylesheet. But I think
that what you're saying is that there are different levels of
augmentation of the basic XML Infoset that you might be interested in,
even when validating against a schema. You think that there should be
a flag that states that the typing information (i.e. typed value and
type properties on nodes) should be omitted (aside from certain
attributes have the ID type, presumably?).

I'm assuming (perhaps wrongly) that users will be given the options of
not validating against a schema at all, using a DTD, or using a schema
that they've designed specifically to give them the information they
need in the stylesheet. What do you see as the benefits of giving
users this option -- of validating against a schema but ignoring some
of what that tells you -- as well?



Jeni Tennison


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS