[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] xml:base and fragments
- From: "Andrew S. Townley" <ast@atownley.org>
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Date: Sun, 7 May 2017 23:07:58 +0200
First, an apology. It seems my xml-dev mail delivery has issues, so I’d only originally seen a couple of the mails on this topic before I replied. Wondering further led me to the archives, so I discovered much of what I’d previously said had been covered.
> On May 7, 2017, at 9:43 PM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
>
>
>> On May 7, 2017, at 2:54 AM, Andrew S. Townley <ast@atownley.org> wrote:
>>
>> … I think the key phrase of the XML Base 2nd Edition Recommendation is:
>>
>> "The behavior of xml:base attributes in applications based on specifications that do not have direct or indirect normative reference to XML Base is undefined.”
>>
>> To me, this means that each specification using xml:base needs to formally define the semantics of how it applies in the given XML vocabulary.
>
> How do you get a spec using xml:base that does not have direct or indirect normative reference to the
> XML Base spec?
>
> You might get applications with no normative reference to XML Base, asked to process data which
> contains xml:base (i.e. data defined by some other spec that does depend on XML Base) — I think
> that’s what the sentence quoted is talking about.
In the literal sense, you’re correct. However, as Michael’s commentary on XSLT points out, what I was actually trying to say was that the specific resolution semantics are supposed to be defined in the specification using xml:base, not in the XML Base specification itself.
>
>
>> If this is done correctly, then the questions below shouldn’t be an issue at all because the authors define the extend and behavior of the way the xml:base attribute is used. Compliant processors MUST follow those rules.
>
> Also — where on earth does this idea arise that the questions below relate to the interpretation of the
> xml:base spec? The semantics which some people find unexpected is defined by RFC 3986, not
> by XML Base. It is not the use of XML Base, but the use of URIs, that bring the semantics in question
> into one’s application.
I wouldn’t see it that way, but maybe it’s just me.
To my mind, the use of xml:base and the way that influences the interpretation of values within a give vocabulary is completely relevant because it’s the rules of XML base that construct the URIs to be resolved. XML Base comes first, then 3986 comes second.
More specifically, an application processing a document using xml:base must construct a URI on the basis of the current xml:base value, if any *before* that URI is to be resolved by any application processing the document. Therefore, the result of applying XML Base is a fully-qualified URI, not a relative URI.
The issue here comes back to some of Simon’s comments just now about the needs of authors vs. application developers.
Again, maybe it’s just me, but xml:base exists to provide document authors a shorthand for URI references to elements that belong to the same URI structure. In this way, I see them much more like namespace prefixes than I do anything else, and that’s why I suggest the processing model above and also why I say that the way this question should be answered actually has nothing to do with 3986 at all.
If you are expecting on 3986 to take care of this for you, then I don’t see the value of specifying and using xml:base at all, because to my reading, they apply different sets of rules. One is about creating URI values and the other is about resolving those. In the context of this discussion, they are subtle but important differences.
>
>>
>> More below:
>>
>>> On May 4, 2017, at 6:21 PM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
>>>
>>> ...
>>>
>>>>
>>>> In this excerpt from example.xml
>>>>
>>>> <div xml:base="http://www.dictionary.com/a.html">
>>>> <p>
>>>> <ref target="#apple">Apple</ref>
>>>> </p>
>>>> </div>
>>>> does #apple refer to an element in example.xml that has id="apple" or to http://www.dictionary.com/a.html#apple?
>>>>
>>>> The first, right?
>>>
>>> ...
>>> I think the question can be paraphrased using defined terms as “Does
>>> ‘#apple’ identify the resource
>>>
>>> (1) …/example.xml#apple
>>>
>>> (where … denotes whatever absolute context was present in the URI
>>> used to retrieve the document in the first place) or the resource
>>>
>>> (2) http://www.dictionary.com/a.html#apple
>>>
>>> ?
>>>
>>> The answer, as I read RFC 3986, is “both”.
>>> ...
>>
>> The premise that both references are the same depends on the way the vocabulary is defined.
>
> I am not sure what you mean here by “both references are the same”. By “reference” do you mean the pointer, or the thing pointed at?
If I wanted to say resource, I would’ve. I’m talking about references to elements/items within a given resource. URIs are references that are resolved to resources with optional granular references to individual content elements.
>
>> Unless the vocabulary is ill-defined or the document is actually resolvable relative to the xml:base location, I find it hard to see how the answer could be both.
>
> RFC 3986 says clearly that
>
> - the relative reference “#apple” in the example has the absolute form http://www.dictionary.com/a.html#apple
> - the relative reference “#apple” is a same-document reference
> - the relative reference “#apple” identifies a fragment of the document retrieved, in the example, from …/example.xml
>
> I don’t know how to accept the third of these without also accepting the proposition that the relative reference “#apple” in the example identifies the same fragment of …/example.xml as is identified by …/example.com#apple but other people may find a way.
>
> You are right that it may depend on the document vocabulary. In the example, the vocabulary is TEI, and the TEI documentation says that the target attribute is a URI. Which means that 3986 applies.
I’m aware of what the URI RFC says, but as I stated above, I don’t see that it applies until AFTER xml:base processing has occurred. You’re trying to say that xml:base influences the resolution of the relative URI from the domain of the URI itself. That’s a different dimension than the content of the document, and this is why I think there’s confusion about how this should really work.
Since it’s an XML attribute, xml:base should influence the interpretation and potential alteration of the *content* of the document, not the actions taken upon that content, e.g. resolution of URIs. They are orthogonal, and conflating these the way you seem to propose seems like breaking some abstractions somewhere along the line.
Using this layered processing approach, the TEI documentation is still valid, and authors can use the xml:base to influence the creation of the URIs to be resolved by processing application. Once those URIs have been created, then they’re resolved according to 3986.
>
>>
>> In the above example, since once an xml:base is defined, it can’t be unset, only redefined, if you set the xml:base in a parent element, then tried to refer to a child element within the scope of a given xml:base, that element would be inaccessible via xml:id. To access it, you’d need to use something else like XPath.
>
> Why? RFC 3986 is explicit that “#apple” in the example is a same-document reference.
Without xml:base, yes. In the presence of xml:base, not necessarily.
My interpretation is directly in-line with Section 3.1 URI Reference Encoding and Escaping because the processing application should generate unescaped URI references. The only way this can work in practice is if the interpretation of xml:base and the attributes representing the URI fully takes place before RFC 3986 resolution kicks in.
Since Section 4.1 specifically says only that xml:base applies to when the base URI is embedded in the document’s content in relation to RFC 3986 resolution, and since Section 4.2 states that only after xml:base processing (if defined) the URI of the document or external entity should be used, then this still means that my interpretation doesn’t lead to any ambiguity and it doesn’t short-circuit regular RFC 3986 resolution the case where xml:base has not been defined.
In both #1 and #2 of Section 4.2, content processing and interpretation at the content layer is what’s being discussed, not addressing at the transport layer—even though they’re extremely closely related and using the same terminology.
>
>>
>> In this case RFC 3986 is irrelevant, actually, because the question isn’t about the resources, it’s about the way the addressing of those resources is resolved by the processing application in light of the presence of xml:base and the specification about the vocabularies being processed.
>
> The vocabulary in question says that the attribute value in question is a URI. The resolution of relative URI references is normatively defined by RFC 3986.
But relative URI references defined by xml:base are only defined in XML Base, not RFC 3986. Section 5.1.1 of RFC 3986 defines that these exist, and that they should be interpreted “somehow.” Even in this case, however, creation of a resolvable URI from within the rules of content are governed by the rules of the content vocabulary, not RFC 3986.
As an option, content processors could choose to munge these operations together, but that means you end up with the confusion in the original question. If you let the content rules do their thing before the resolution rules in RFC 3986, then actually, you never get relative references that shouldn’t be resolved against the scope of the encapsulating entity—ever.
>
>
>> Once the vocabularies say that a given attribute value is a URI, then the processing rules for that URI must follow those defined in the XML Base recommendation PROVIDING that the vocabulary’s specification provides a normative reference to XML Base and indicates that it applies to specific attribute values.
>
> That is given, I am told, in the TEI case. (I have not inspected the current state of the TEI Guidelines myself.)
>
>>
>> Assuming the vocabulary works in a similar way to XLink, the presence of xml:base denotes a resolution scope for all nested references until the end of the element to which the xml:base attribute has been applied.
>
> I think it would be clearer to say that the presence of an xml:base attribute sets the base URI for the element on which the attribute occurs and for all of its descendants which have no nearer ancestor-or-self element with an xml:base attribute. That’s the way XML Base defines the semantics of xml:base.
>
Po-ta-toe, po-tat-O… ;)
I understand them both the same way, and I was using different terminology in case the terminology was the issue at hand.
>
>>
>> Case 1:
>>
>> If no xml:base is present, relative URI references are resolved in the content of the current document, e.g. ‘#apple’ in this case would represent the vocabulary-specific element identified by ‘#apple’ or using xml:id=“#apple”.
>>
>> Case 2:
>>
>> If an xml:base is present on a given vocabulary node, AND the vocabulary specifies behavior similar to XLink, THEN, the references to relative URIs of any form within the scope of the declaring element or children of the declaring element MUST be resolved in terms of the “closest” xml:base value.
>>
>> E.g.
>>
>> <e1 xml:base=“foo.html”>
>> <link target=“#a”/>
>> </e1>
>> <s2 xml:id=“a”>
>> …
>> <link target=“#a”/>
>> </s2>
>>
>> Means that if the value of the target attribute is specified to be a URI, then the resulting URI under the ‘e1’ element MUST be 'foo.html#a’, regardless of any other rules in place.
>
> And so it does.
>
> It is ALSO, however, defined by RFC 3986 as a same-document reference. So saying only that it
> resolves to foo.html#a is at best an incomplete account of the situation: the defining document for
> rules of URI resolution says that dereferencing the first link in your example “should not” cause a
> new retrieval action, because the resource addressed is present in the document already loaded.
>
> It is the co-existence of these two claims (the reference to #a refers to foo.html#a, the
> reference to #a can be dereferenced by locating the id “a” in the current document) which
> appears to make some readers uncomfortable. It made a number of people uncomfortable
> when the text of what is now 3986 was being prepared. But the current text licenses both
> of those claims. (And having had occasion to think about the issues in the last few days,
> I can’t say there are any obvious alternatives which would not license other equally
> uncomfortable inferences.)
Actually, it isn’t. Content rules apply before RFC rules apply, so there’s not a conflict or incomplete situation.
Retrieval actions are again an orthogonal problem. The question is “what’s the fully-qualified URI of any given reference in the source vocabulary?” Resolution rules apply only *after* this question is answered.
I think the issue is still inadequate separation of concerns between the content/authoring layer and the processing/transfer layer. If you keep these separate, there’s no overlap or conflict, and there’s no uncomfortable situations other than getting your head around these are two different layers of architecture.
>
>>
>> The second link target MUST resolve to the ‘[instance]#a’ URI and would be the element ‘s2’ having that XML ID.
>>
>> This is basically vamping on the first example of the xml:base recommendation, so assuming that the definitions for the given vocabulary are specified the same way, then the same resolution rules would apply.
>>
>> The main point is the behavior of xml:base is supposed to be defined by the USING specification, not xml:base. The XML Base recommendation just defines some rules for the attribute and provides examples of how this is used in the most relevant customer specifications at the time, namely XLink.
>>
>> The focus on the meaning and reference of RFC 3986 is a red herring and not the issue to be discussed/resolved at all. The issue is making sure the specification for the vocabulary USING xml:base fully and completely defines the semantics of how that attribute should be interpreted for all vocabulary elements containing URIs.
>
> Any vocabulary that claims to be using URIs and is not using the rules for resolving URI references given in 3986 is not, in fact, using URIs.
Again, you’re talking about resolving URIs. The xml:base attribute is really about *creating* URIs that will later be resolved, and that, to me, is where the root of the confusion stems.
XML Base is like a macro for authors, and that’s why I think it was modeled on HTML’s BASE element in the first place. The only reason for it to exist is to reduce typing and ease maintenance of the document *for authors*, not processors.
>
>>
>> Conceivably, the given vocabulary could define attributes that were aware of XML Base for URI resolution and some that were not.
>
> That strikes me as being a logical contradiction. If the latter class are said to be URIs, then the only way to resolve them is with reference to the base URI. If xml:base is being used, then the base URI is as given by the xml:base attributes and the rules in XML Base and 3986.
Nope. Attr1 says resolve according to xml:base processing rules, and Attr 2 says resolve according to RFC 3986 rules without any understanding of xml:base. It might be unusual, but the defining vocabulary has the final say in how the content should be processed.
Mixed content is the first such example I can think of for a scenario like this, because if you had XML and non-XML in the same document, both using URIs, only xml:base would apply to the XML attributes in the vocabulary supporting XML Base.
Multiple XML vocabularies are another example, where one supports xml:base and one does not. You can’t just interpret XML Base as globally applicable. You must do what the ultimate vocabulary tells you to do.
Again, that’s to me why it’s defined at the content layer and not at the transfer/addressing layer.
>
>> However, this is totally up to the given vocabulary and independent of XML Base or RFC 3986.
>
> I think I disagree.
>
>> Where did the initial confusion actually come from?
>
> I believe it came from a question about whether “#foo” would point to something else in the same document or to something outside the document; it is not clear to me that anyone other than me takes seriously the possibility that the rules of RFC 3986 might say (or entail) that it is both.
>
> It is in many ways a tempest in a teapot, since it’s easy enough to avoid having any URIs with this unusual sort of double interpretation. But that requires understanding how the double interpretation arises (and that, in turn, requires taking seriously the words in the RFC).
Fair enough.
>
> ********************************************
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> cmsmcq@blackmesatech.com
> http://www.blackmesatech.com
> ********************************************
>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
--
Andrew S. Townley <ast@atownley.org>
http://atownley.org
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]