OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] How to spell "No PSVI" in XSLT 2.0 without getting sucked

[ Lists Home | Date Index | Thread Index ]


You are making the assumption that in most cases the XSLT processor has
control over the parsing (and/or serialization). I am saying that this is an
inappropriate assumption and flies in the face of any pipeline-processing
model, such as found in Cocoon or any application that uses an API such as
JAXP. Adding more *optional* features will not help interoperability and
will do nothing to *ensure* that a source tree is treated as nothing more
than plain vanilla elements, attributes, and text.

XSLT's data-model-centric view could only succeed in practice if there was
never any doubt about the corresponding serialization. The XSLT/XPath 1.0
data model has an unambiguous mapping to its serialized form. In particular,
the inherent constraints within the data model prevent a tree from ever
being constructed that does not correspond to a real XML document (or EGPE).
This is even more than can be said of the Infoset (because of its redundancy
in modeling namespaces). The XPath 1.0 data model can always be
round-tripped between abstract node tree and serialization; that is the
point I'm trying to make. The same cannot be said of the PSVI (or of the
current XPath 2.0 data model).

XSLT 2.0 has pretty much committed to providing support for PSVI pipelines.
Unless there is a mode that restricts input to vanilla XML (for lack of a
better term), the mere presence of PSVIs will destroy the possibility of
robust vanilla XML pipelines.


> -----Original Message-----
> From: Jeni Tennison [mailto:jeni@jenitennison.com]
> Sent: Tuesday, May 14, 2002 5:07 AM
> To: Evan Lenz
> Cc: uche.ogbuji@fourthought.com; xml-dev@lists.xml.org
> Subject: Re: [xml-dev] How to spell "No PSVI" in XSLT 2.0 without
> getting sucked into the Processing Model black hole
> Hi Evan,
> > As much as I resonate with the intent of your approach, I think it's
> > the wrong one. While there are examples of abuse (e.g.
> > M$tripping-whitespace), XSLT 1.0's deliberate de-coupling of data
> > model and serialization was a good and successful decision.
> > xsl:output is an optional feature for very good reasons. This design
> > has made for a remarkable marriage of flexibility and
> > interoperability.
> Absolutely. I was imagining that xsl:input would similarly represent a
> decoupling of parsing issues and transforming issues, and support for
> xsl:input be an optional feature. This would provide interoperability
> between the high percentage of XSLT processors that provide an
> interface by which you can supply simply the URI of a source document
> and have them manage the parsing for you.
> It strikes me that Elliotte's argument that XInclude elements should
> not be processed by an XSLT processor or not is *precisely* the kind
> of issue that would be resolved if there was a recommended method of
> controlling the document-to-node-tree process within the stylesheet.
> > As far as I see it, there are two things that were *crucial* in
> > ensuring the success of this approach:
> >
> >   1. There was always an unambiguous, lossless,
> >      one-to-one mapping between the data model
> >      and its serialized form, and
> >
> >   2. with few exceptions, all information in the
> >      data model was present in the corresponding
> >      serialized *instance* document.
> >
> > The PSVI is the antithesis to both of the above. And this is why
> > many of us are worried about how interoperable our stylesheets will
> > be in a PSVI-oriented world.
> First, I don't think the mapping from the data model to its serialized
> form is actually one to one. For any given set of attribute values on
> xsl:output, there are always options: should the processor use UTF-8
> or UTF-16? Should it escape characters with decimal or hexadecimal
> character references? Should it use single quotes or double quotes
> around attribute values?
> Second, you seem to be saying that there's no one-to-one mapping
> between a document+schema to a node tree. If that is the case then, as
> with xsl:output, those flexibilities that are judged important should
> be parameterised. That's already being done through the
> ignore-whitespace/ignore-comments/ignore-processing-instructions
> flags. Uche and I were suggesting a process-xinclude flag. There might
> be others.
> Could you expand, though, on your second point. I'm not sure how the
> fact that the node tree doesn't contain the entire PSVI impacts on the
> ability of information within the stylesheet to control the creation
> of the node tree.
> > However, I don't think we can dictate a processing chain from the
> > stylesheet any more than we could before. Even if we tried, this
> > won't buy us interoperability, given the many processing frameworks
> > that aren't controlled by "XSLT applications" but nevertheless use
> > XSLT processors. If anything, such an attempt will complicate
> > interoperability problems in the same way that
> > xsl:disable-output-escaping does today.
> I disagree. The reason that xsl:disable-output-escaping causes such
> problems is that it short-circuits the clean distinction between the
> process of building the node tree and the process of serializing that
> node tree. I agree that such short-circuiting is really awful, but I'm
> not suggesting that there be a special feature whereby users can get
> hold of e.g. the declaration of an element despite it not being in the
> node tree. What I'm suggesting is that the node-tree-creation process
> could, optionally, be controlled from within the stylesheet in a
> similarly clean and de-coupled way to the (majority of the)
> node-tree-serialisation process.
> Just as with xsl:output, if a processor is used in a situation where
> it's passed a node tree directly, then it can ignore xsl:input. But
> something like xsl:input would provide consistency between the vast
> majority of XSLT processors that can be run from the command line
> having been given the URL of a source document to transform. We
> haven't needed that up til now because, with the exception of XInclude
> processing, the set of information that you get out of an XML document
> is pretty much fixed. Now that there's a major parameter that affects
> the content of the node tree (namely which schema you use to validate
> the document), I think we do need it.
> I get the argument that XSLT stylesheets should only be concerned with
> the transformation part of the process. But on the other hand, given
> that a major mode of stylesheet processing is a document-to-document
> (rather than tree-to-tree) transformation, and that XSLT says
> something about the tree-to-document part of the process, I think it's
> right that XSLT should say something about the document-to-tree part
> of the process.
> So I think I've missed the basis of your argument against using an
> xsl:input kind of optional control over the parsing process. Could you
> try explaining again?
> > Rather, what is needed is a way to dictate what *kinds* of
> > information can be present in a source/result tree, based on some
> > flag in the stylesheet. In particular, the stylesheet writer should
> > have a way to switch between plain vanilla XML/Infoset, and PSVI
> > with PSVI-specific information items. In short, this is a data model
> > issue more than a processing model issue.
> >
> > Such a flag would enable a stylesheet to process an XML document as
> > vanilla XML, regardless of its processing history. Its processing
> > history may or may not include XML Schema validation. In the event
> > that it does, the visible PSVI augmentations will be constrained to
> > the kinds of information that can occur in the restricted, vanilla
> > data model, namely defaulted attributes, etc. This implies a
> > straightforward algorithm for interpreting a PSVI as an augmented
> > Infoset without the PSVI-specific information items. Such an
> > algorithm would be akin to taking the PSVI, serializing the
> > instance, and parsing it again without respect to a schema.
> When I first read this, I thought you were talking about changing the
> underlying data model based on a flag in the stylesheet. But I think
> that what you're saying is that there are different levels of
> augmentation of the basic XML Infoset that you might be interested in,
> even when validating against a schema. You think that there should be
> a flag that states that the typing information (i.e. typed value and
> type properties on nodes) should be omitted (aside from certain
> attributes have the ID type, presumably?).
> I'm assuming (perhaps wrongly) that users will be given the options of
> not validating against a schema at all, using a DTD, or using a schema
> that they've designed specifically to give them the information they
> need in the stylesheet. What do you see as the benefits of giving
> users this option -- of validating against a schema but ignoring some
> of what that tells you -- as well?
> Cheers,
> Jeni
> ---
> Jeni Tennison
> http://www.jenitennison.com/
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS