Re: [xml-dev] xml:base and fragments

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: "Andrew S. Townley" <ast@atownley.org>
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Date: Sun, 7 May 2017 10:54:32 +0200
As this discussion is relevant to some of the things I’ve done and will continue to do, I want to make a couple of points.

To SJL, yes, you can ignore xml:base, but if you do need that functionality, then you need to redefine it in an application-specific or specification-specific way.  This might not be the best plan.

To Eliot and the people involved in the discussion elsewhere, I think the key phrase of the XML Base 2nd Edition Recommendation is:

"The behavior of xml:base attributes in applications based on specifications that do not have direct or indirect normative reference to XML Base is undefined.”

To me, this means that each specification using xml:base needs to formally define the semantics of how it applies in the given XML vocabulary.  If this is done correctly, then the questions below shouldn’t be an issue at all because the authors define the extend and behavior of the way the xml:base attribute is used.  Compliant processors MUST follow those rules.

More below:

> On May 4, 2017, at 6:21 PM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
> 
> 
>> On May 4, 2017, at 7:07 AM, Eliot Kimber <ekimber@contrext.com> wrote:
>> 
>> The base URI of the ref element is “http://www.dictionary.com/a.html” therefore the fragment identifier *MUST* refer to an element within a.html.
>> ...
>> 
>> From: "John P. McCaskey" <mailbox@johnmccaskey.com>
>> Reply-To: <mailbox@johnmccaskey.com>
>> Date: Thursday, May 4, 2017 at 7:53 AM
>> To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
>> Subject: [xml-dev] xml:base and fragments
>> 
>> A (somewhat Talmudic) discussion on the TEI mailing list (https://listserv.brown.edu/archives/cgi-bin/wa?A0=tei-l) regarding xml:base has come to a conclusion I want to run by XML-DEV readers.
>> 
>> TEI defines some XML attributes as RFC 3986-compliant URIs that fully honor xml:base.
>> 
>> The question is whether a value in this attribute of the form #fragment refers to an XML element in the document that contains the attribute or to a location specified by xml:base.
>> 
>> The (almost) consensus is that to resolve such a standalone fragment, xml:base values should be ignored.
> 
> As one of the exegetes involved in the Talmudic discussion in question, I hasten to note that I have not
> seen anyone suggest that xml:base values be ignored in resolving a relative reference of the form #fragment.
> I don’t know where that characterization comes from, but it’s not a correct paraphrase of any position I have 
> seen taken in the discussion.
> 
>> 
>> In this excerpt from example.xml
>> 
>>  <div xml:base="http://www.dictionary.com/a.html";>
>>    <p>
>>      <ref target="#apple">Apple</ref>
>>    </p>
>>  </div>
>> does #apple refer to an element in example.xml that has id="apple" or to http://www.dictionary.com/a.html#apple?
>> 
>> The first, right?
> 
> The verb “refer to” is not a technical term defined by RFC 3986, and it
> is tightly defined neither in ordinary language nor (as far as I can tell) in
> philosophy of language (that is, individual philosophers of language often
> provide sharp definitions of the term, but not necessarily the same or
> equivalent definitions).  So it’s not clear that the question as asked has
> any definitie answer.
> 
> I think the question can be paraphrased using defined terms as “Does
> ‘#apple’ identify the resource 
> 
> (1)  …/example.xml#apple
> 
> (where … denotes whatever absolute context was present in the URI
> used to retrieve the document in the first place) or the resource 
> 
> (2)  http://www.dictionary.com/a.html#apple
> 
> ?
> 
> The answer, as I read RFC 3986, is “both”.  
> 
> If the creator of a document is not happy with that answer, then caution 
> should be taken in the use of xml:base and fragment-only identifiers.
> 
> - We know that in the context given ‘#apple’ denotes URI (2), because
> that’s the way xml:base and the rules for absolutizing URIs work.
> 
> - We also know that the reference to ‘#apple’ is a same-document 
> reference, as that term is defined in RFC 3986, because the absolute
> URI of the reference is (2), and the base URI is 
> 
>    http://www.dictionary.com/a.html
> 
> and the two are identical, aside from the fragment identifier.  And
> that is how RFC 3986 defines same-document references.
> 
> - Of same-document URIs, RFC 3986 observes 
> 
>   When a same-document reference is dereferenced for a retrieval
>   action, the target of that reference is defined to be within the same
>   entity (representation, document, or message) as the reference;
>   therefore, a dereference should not result in a new retrieval action.
> 
> - Since the target of the reference "is defined to be within” example.xml,
> it seems that in context, ‘#apple’ identifies the element in example.xml
> with xml:id=“apple”.  That is, (1).
> 
> This all makes some sense if one remembers that the initial purpose
> of the HTML ‘base’ element, whose functionality xml:base is intended
> to replicate, was to provide an in-document representation for the
> document’s URI, for cases where that information is not available
> from the context.  Neither HTML 4.01 or xml:base actually assign
> the meaning “this is a URI for this document” to the construct, so
> it’s not necessarily abusive to use it in other ways.  But the discussion
> of same-document references in 3986 does have the effect of 
> defining the targets of any URIs identical to the base URI (aside 
> from the fragment component) as being inside the current document.
> 
> N.B. the discussion here applies even if the reference in question is 
> absolute, not relative — it is not a question of ignoring xml:base when 
> resolving ‘#apple’; the statements in 3986 apply equally to 
> ‘http://www.dictionary.com/a.html#apple’.

The premise that both references are the same depends on the way the vocabulary is defined.    Unless the vocabulary is ill-defined or the document is actually resolvable relative to the xml:base location, I find it hard to see how the answer could be both.

In the above example, since once an xml:base is defined, it can’t be unset, only redefined, if you set the xml:base in a parent element, then tried to refer to a child element within the scope of a given xml:base, that element would be inaccessible via xml:id.  To access it, you’d need to use something else like XPath.

In this case RFC 3986 is irrelevant, actually, because the question isn’t about the resources, it’s about the way the addressing of those resources is resolved by the processing application in light of the presence of xml:base and the specification about the vocabularies being processed.  Once the vocabularies say that a given attribute value is a URI, then the processing rules for that URI must follow those defined in the XML Base recommendation PROVIDING that the vocabulary’s specification provides a normative reference to XML Base and indicates that it applies to specific attribute values.

Assuming the vocabulary works in a similar way to XLink, the presence of xml:base denotes a resolution scope for all nested references until the end of the element to which the xml:base attribute has been applied.

Case 1:

If no xml:base is present, relative URI references are resolved in the content of the current document, e.g. ‘#apple’ in this case would represent the vocabulary-specific element identified by ‘#apple’ or using xml:id=“#apple”.

Case 2:

If an xml:base is present on a given vocabulary node, AND the vocabulary specifies behavior similar to XLink, THEN, the references to relative URIs of any form within the scope of the declaring element or children of the declaring element MUST be resolved in terms of the “closest” xml:base value.

E.g.

<e1 xml:base=“foo.html”>
 <link target=“#a”/>
</e1>
<s2 xml:id=“a”>
 …
  <link target=“#a”/>
</s2>

Means that if the value of the target attribute is specified to be a URI, then the resulting URI under the ‘e1’ element MUST be 'foo.html#a’, regardless of any other rules in place.

The second link target MUST resolve to the ‘[instance]#a’ URI and would be the element ‘s2’ having that XML ID.

This is basically vamping on the first example of the xml:base recommendation, so assuming that the definitions for the given vocabulary are specified the same way, then the same resolution rules would apply.

The main point is the behavior of xml:base is supposed to be defined by the USING specification, not xml:base.  The XML Base recommendation just defines some rules for the attribute and provides examples of how this is used in the most relevant customer specifications at the time, namely XLink.

The focus on the meaning and reference of RFC 3986 is a red herring and not the issue to be discussed/resolved at all.  The issue is making sure the specification for the vocabulary USING xml:base fully and completely defines the semantics of how that attribute should be interpreted for all vocabulary elements containing URIs.

Conceivably, the given vocabulary could define attributes that were aware of XML Base for URI resolution and some that were not.  However, this is totally up to the given vocabulary and independent of XML Base or RFC 3986.

Where did the initial confusion actually come from?

Cheers,

ast

> 
> 
> 
> ********************************************
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> cmsmcq@blackmesatech.com
> http://www.blackmesatech.com
> ********************************************
> 
> 
> _______________________________________________________________________
> 
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
> 
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

--
Andrew S. Townley <ast@atownley.org>
http://atownley.org
Follow-Ups:
- Re: [xml-dev] xml:base and fragments
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
References:
- xml:base and fragments
  - From: "John P. McCaskey" <mailbox@johnmccaskey.com>
- Re: [xml-dev] xml:base and fragments
  - From: Eliot Kimber <ekimber@contrext.com>
- Re: [xml-dev] xml:base and fragments
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]