OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] xml:base and fragments

> On May 7, 2017, at 3:07 PM, Andrew S. Townley <ast@atownley.org> wrote:
> ...
>> On May 7, 2017, at 9:43 PM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
>> ...
>> Also — where on earth does this idea arise that the questions below relate to the interpretation of the
>> xml:base spec?  The semantics which some people find unexpected is defined by RFC 3986, not
>> by XML Base.  It is not the use of XML Base, but the use of URIs, that bring the semantics in question
>> into one’s application.
> I wouldn’t see it that way, but maybe it’s just me.
> To my mind, the use of xml:base and the way that influences the interpretation of values within a give vocabulary is completely relevant because it’s the rules of XML base that construct the URIs to be resolved.  XML Base comes first, then 3986 comes second.

Very few people have specified what they think is a problem here, but I expect most people who are unhappy with the interpretations offered for various examples earlier in this thread are unhappy with the conclusion that given a document URI D, a base URI B, and a fragment-only reletive reference #R, #R has the absolute form B#R and also denotes the fragment R within the current document (so it denotes the same thing as D#R).  If this is true, then they are of course within their rights to be unhappy.  But the conclusion that #R identifies both B#R and D#R is not a function of one’s choice of base URI, or of the mechanism used to specify a base URI.  It is a function of the way the resollution rules in RFC 3986 define the interaction of document URIs, base URIs, and relative references.

> More specifically, an application processing a document using xml:base must construct a URI on the basis of the current xml:base value, if any *before* that URI is to be resolved by any application processing the document.  

No.  “URI resolution” is the process of constructing an absolute URI from a relative URI by absolutizing it against the relevant base URI.

It is not to be confused with “dereferencing”, the process of performing some action (e.g. a GET or a PUT) on a URI.

And neither is to be confused with “retrieval”, which is the fetching of data from an external source.  (If a browser has already loaded a document, it does not need to retrieve the document again in order to dereference a fragment in the current document; it can and should simply scroll to the new location.)

The first two of these terms are defined and distinguished in RFC 3986 section 1.2.2, "Separating Identification from Interaction”.

> Therefore, the result of applying XML Base is a fully-qualified URI, not a relative URI.

Yes, it is.  It is the fully qualified (= absolute) URI produced by applying the rules of RFC 3986 to the base URI and the relative reference.  We know it’s the result of the RFC 3986 rules, because those are the rules XML Base normatively refers to for creating an absolute URI from a relative reference, and the only ones specified in the XML Base spec.

Can you point to the words in XML Base which you believe explain how to produce this result and which do not (if I understand your claim correctly) appeal to any rules in 3986? 

> The issue here comes back to some of Simon’s comments just now about the needs of authors vs. application developers.
> Again, maybe it’s just me, but xml:base exists to provide document authors a shorthand for URI references to elements that belong to the same URI structure. In this way, I see them much more like namespace prefixes than I do anything else, and that’s why I suggest the processing model above and also why I say that the way this question should be answered actually has nothing to do with 3986 at all.’

You are not alone, I think.  Many people do regard the point of in-content base URIs, as provided by XML Base and by the HTML ‘base’ element, in just that way.  Others, as you probably know, regard the purpose of in-content base URIs as being to provide a meaningful base URI in contexts where for some reason the retrieval URI is not otherwise available.  Since the editors of the relevant specifications include people of both opinions, it is not surprising if the specs occasionally exhibit an uneasy truce between the two positions.

But it’s not clear to me why a technology like xml:base having a particular purpose leads you to believe that questions about the meaning of certain constructs like the two #a references in your earlier example should be answered without considering the normative text which assigns meaning to those constructs.  It happens that that normative text is found in RFC 3986.    

> If you are expecting on 3986 to take care of this for you, then I don’t see the value of specifying and using xml:base at all, because to my reading, they apply different sets of rules.  

They define overlapping sets of rules:  since xml:base refers normatively to RFC 3986, the one set is a superset of the other.

RFC 3986 defines a generic syntax for URIs and prescribes rules for reference resolution, including rules for establishing a base URI (which include the possibility of base URIs embedded in the content of the entity, for certain media types) and rules for resolving relative references against a base URI.  It also addresses a few other issues.

XML Base, by contrast, limits itself to providing one mechanism for embedding base URIs within content, suitable for XML documents (application/xml and other), just as various versions of the HTML spec define a different mechanism for embedding base URIs within content (for text/html and some other media types).  It refers to RFC 3986 for the explanation of what it means to have a base URI and how it affects the denotation of references.

Without xml:base, an XML application would need to specify its own mechanism for embedding base URIs in content.  There’s nothing to make that impossible, but when xml:base provides an adequate mechanism, there’s no pressing need to invent another.

> One is about creating URI values and the other is about resolving those.  In the context of this discussion, they are subtle but important differences.

The question with which the discussion began was, essentially, “What is identified by the target attribute in the following document?”, where the ‘target’ attribute is understood as containing a URI (here a relative reference) and the base URI is given by xml:base.

  <div xml:base="http://www.dictionary.com/a.html";>
      <ref target="#apple">Apple</ref>

It is a question about the meaning of a given relative reference in a document.  

It is not a question about how best to support authoring, or how best to simplify implementations of URI-aware software, or how to make spec-casuists happy.  It is not a question about what network activity should or should not follow a request to dereference the URI “#apple”, and if it were, it would reduce to the question about what the reference identifies, because a conforming browser will dereference the URI by fetching (if need be) the object identified, and then displaying the relevant fragment (so the question about dereferencing behavior reduces to the question about meaning).

RFC 3986 and xml:base do indeed have importantly different roles here.  The xml:base spec tells us that the base URI to use in resolving “#apple” is http://www.dictionary.com/a.html, and RFC 3986 tells us (a) that the absolute form of the reference is http://www.dictionary.com/a.html#apple and (b) that the target of the reference is by definition contained within the current document and dereferencing it “should not” launch a new retrieval action. 

To return to the topic with which this note started:

A number of people unhappy with being told both (a) and (b) have suggested on this list that they think xml:base has done something wrong; some have suggested that HTML ‘base’ does better.  Since the only role xml:base plays in this scenario is identifying the base URI and referring normatively to RFC 3986, and since a corresponding example is trivially constructible with HTML, I have thus far found their analysis unconvincing.  

For myself, I’m not sure I think there is a problem here.  The meaning of the construct is perfectly well defined; if you don’t want your references to have the double meaning of referring both to something in the current document and something in what you believe to be a different document, then it’s not hard to avoid constructing such references.  Other things being equal, having a less surprising meaning attached to the example would probably be preferable, but it is very very difficult to make rules for URI resolution which guarantee that “#foo” will always refer to the current document and which allow the in-content specification of arbitrary base URIs and which have no odd consequences or other blemishes.  

And in some cases, the ‘double’ interpretation is perfectly sane and sensible, not a problem at all.  In a driver document which uses XInclude to embed various subordinate documents, the line in the source that looks like

    <xi:include href=“http://www.dictionary.com/a.html”/>

might result in a div like that shown above in the output.  The reference “#apple” is, when it occurs within http://www.dictionary.com/a.html, definitely a reference to the fragment “apple” within that document.  Since that document has now been embedded in a larger document, it is equally definitely also a reference to the fragment “apple” in the current document, even if the driver file for the XInclude processing is at some very different URI.

C. M. Sperberg-McQueen
Black Mesa Technologies LLC

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS