OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] xml:base and fragments

> On May 7, 2017, at 2:54 AM, Andrew S. Townley <ast@atownley.org> wrote:
> …  I think the key phrase of the XML Base 2nd Edition Recommendation is:
> "The behavior of xml:base attributes in applications based on specifications that do not have direct or indirect normative reference to XML Base is undefined.”
> To me, this means that each specification using xml:base needs to formally define the semantics of how it applies in the given XML vocabulary.

How do you get a spec using xml:base that does not have direct or indirect normative reference to the
XML Base spec?

You might get applications with no normative reference to XML Base, asked to process data which 
contains xml:base (i.e. data defined by some other spec that does depend on XML Base) — I think
that’s what the sentence quoted is talking about.

>  If this is done correctly, then the questions below shouldn’t be an issue at all because the authors define the extend and behavior of the way the xml:base attribute is used.  Compliant processors MUST follow those rules.

Also — where on earth does this idea arise that the questions below relate to the interpretation of the
xml:base spec?  The semantics which some people find unexpected is defined by RFC 3986, not
by XML Base.  It is not the use of XML Base, but the use of URIs, that bring the semantics in question
into one’s application.

> More below:
>> On May 4, 2017, at 6:21 PM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
>> ...
>>> In this excerpt from example.xml
>>> <div xml:base="http://www.dictionary.com/a.html";>
>>>   <p>
>>>     <ref target="#apple">Apple</ref>
>>>   </p>
>>> </div>
>>> does #apple refer to an element in example.xml that has id="apple" or to http://www.dictionary.com/a.html#apple?
>>> The first, right?
>> ...
>> I think the question can be paraphrased using defined terms as “Does
>> ‘#apple’ identify the resource 
>> (1)  …/example.xml#apple
>> (where … denotes whatever absolute context was present in the URI
>> used to retrieve the document in the first place) or the resource 
>> (2)  http://www.dictionary.com/a.html#apple
>> ?
>> The answer, as I read RFC 3986, is “both”.  
>> ...
> The premise that both references are the same depends on the way the vocabulary is defined.    

I am not sure what you mean here by “both references are the same”.  By “reference” do you mean the pointer, or the thing pointed at?

> Unless the vocabulary is ill-defined or the document is actually resolvable relative to the xml:base location, I find it hard to see how the answer could be both.

RFC 3986 says clearly that 

- the relative reference “#apple” in the example has the absolute form http://www.dictionary.com/a.html#apple 
- the relative reference “#apple” is a same-document reference
- the relative reference “#apple” identifies a fragment of the document retrieved, in the example, from …/example.xml

I don’t know how to accept the third of these without also accepting the proposition that the relative reference “#apple” in the example identifies the same fragment of …/example.xml as is identified by …/example.com#apple but other people may find a way.

You are right that it may depend on the document vocabulary.  In the example, the vocabulary is TEI, and the TEI documentation says that the target attribute is a URI.  Which means that 3986 applies.  

> In the above example, since once an xml:base is defined, it can’t be unset, only redefined, if you set the xml:base in a parent element, then tried to refer to a child element within the scope of a given xml:base, that element would be inaccessible via xml:id.  To access it, you’d need to use something else like XPath.

Why?  RFC 3986 is explicit that “#apple” in the example is a same-document reference.

> In this case RFC 3986 is irrelevant, actually, because the question isn’t about the resources, it’s about the way the addressing of those resources is resolved by the processing application in light of the presence of xml:base and the specification about the vocabularies being processed.  

The vocabulary in question says that the attribute value in question is a URI.  The resolution of relative URI  references is normatively defined by RFC 3986.

> Once the vocabularies say that a given attribute value is a URI, then the processing rules for that URI must follow those defined in the XML Base recommendation PROVIDING that the vocabulary’s specification provides a normative reference to XML Base and indicates that it applies to specific attribute values.

That is given, I am told, in the TEI case.  (I have not inspected the current state of the TEI Guidelines myself.)

> Assuming the vocabulary works in a similar way to XLink, the presence of xml:base denotes a resolution scope for all nested references until the end of the element to which the xml:base attribute has been applied.

I think it would be clearer to say that the presence of an xml:base attribute sets the base URI for the element on which the attribute occurs and for all of its descendants which have no nearer ancestor-or-self element with an xml:base attribute.  That’s the way XML Base defines the semantics of xml:base.

> Case 1:
> If no xml:base is present, relative URI references are resolved in the content of the current document, e.g. ‘#apple’ in this case would represent the vocabulary-specific element identified by ‘#apple’ or using xml:id=“#apple”.
> Case 2:
> If an xml:base is present on a given vocabulary node, AND the vocabulary specifies behavior similar to XLink, THEN, the references to relative URIs of any form within the scope of the declaring element or children of the declaring element MUST be resolved in terms of the “closest” xml:base value.
> E.g.
> <e1 xml:base=“foo.html”>
> <link target=“#a”/>
> </e1>
> <s2 xml:id=“a”>
> …
>  <link target=“#a”/>
> </s2>
> Means that if the value of the target attribute is specified to be a URI, then the resulting URI under the ‘e1’ element MUST be 'foo.html#a’, regardless of any other rules in place.

And so it does.

It is ALSO, however, defined by RFC 3986 as a same-document reference.   So saying only that it 
resolves to foo.html#a is at best an incomplete account of the situation:  the defining document for
rules of URI resolution says that dereferencing the first link in your example “should not” cause a
new retrieval action, because the resource addressed is present in the document already loaded.

It is the co-existence of these two claims (the reference to #a refers to foo.html#a, the 
reference to #a can be dereferenced by locating the id “a” in the current document) which
appears to make some readers uncomfortable. It made a number of people uncomfortable
when the text of what is now 3986 was being prepared.  But the current text licenses both
of those claims.  (And having had occasion to think about the issues in the last few days,
I can’t say there are any obvious alternatives which would not license other equally
uncomfortable inferences.)

> The second link target MUST resolve to the ‘[instance]#a’ URI and would be the element ‘s2’ having that XML ID.
> This is basically vamping on the first example of the xml:base recommendation, so assuming that the definitions for the given vocabulary are specified the same way, then the same resolution rules would apply.
> The main point is the behavior of xml:base is supposed to be defined by the USING specification, not xml:base.  The XML Base recommendation just defines some rules for the attribute and provides examples of how this is used in the most relevant customer specifications at the time, namely XLink.
> The focus on the meaning and reference of RFC 3986 is a red herring and not the issue to be discussed/resolved at all.  The issue is making sure the specification for the vocabulary USING xml:base fully and completely defines the semantics of how that attribute should be interpreted for all vocabulary elements containing URIs.

Any vocabulary that claims to be using URIs and is not using the rules for resolving URI references given in 3986 is not, in fact, using URIs.

> Conceivably, the given vocabulary could define attributes that were aware of XML Base for URI resolution and some that were not.  

That strikes me as being a logical contradiction.  If the latter class are said to be URIs, then the only way to resolve them is with reference to the base URI.  If xml:base is being used, then the base URI is as given by the xml:base attributes and the rules in XML Base and 3986.

> However, this is totally up to the given vocabulary and independent of XML Base or RFC 3986.

I think I disagree.

> Where did the initial confusion actually come from?

I believe it came from a question about whether “#foo” would point to something else in the same document or to something outside the document; it is not clear to me that anyone other than me takes seriously the possibility that the rules of RFC 3986 might say (or entail) that it is both.

It is in many ways a tempest in a teapot, since it’s easy enough to avoid having any URIs with this unusual sort of double interpretation.  But that requires understanding how the double interpretation arises (and that, in turn, requires taking seriously the words in the RFC).

C. M. Sperberg-McQueen
Black Mesa Technologies LLC

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS