Re: [xml-dev] xml:base and fragments

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
To: "Andrew S. Townley" <ast@atownley.org>
Date: Wed, 10 May 2017 06:28:07 -0600

> On May 10, 2017, at 5:49 AM, Andrew S. Townley <ast@atownley.org> wrote:
> 
> ...
>> On May 10, 2017, at 5:37 AM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
>> 
>> 
>>> On May 9, 2017, at 5:00 PM, Andrew S. Townley <ast@atownley.org> wrote:
>>> 
>>> ...
>>> 
>>> The way I read the definition of “same-document” means that it’s relative to the currently appropriate base URI, so even Section 4.4 has to follow the scoping rules such that if you had something like:
>>> 
>>> <e xml:base="http://www.microsoft.com/bar.xml”>
>>> <bar href=“”/>
>>> </e>
>>> 
>>> According to Section 4.4 and Section 5.1, this empty href is a “same document” reference to the active base URI, in this case http://www.microsoft.com/bar.xml, and NOT the URI used to retrieve the entity (5.1.3) and step 3 of the base URI resolution.
>>> 
>>> More specifically, if you expanded the above example:
>>> 
>>> <e xml:base="http://www.microsoft.com/bar.xml”>
>>> <bar href=“”/>
>>> <baz href=“#node1”/>
>>> </e>
>>> 
>>> According to the RFC, the bar href must expand to "http://www.microsoft.com/bar.xml” and the baz href must expand to "http://www.microsoft.com/bar.xml#node1” making both bar and baz href “same-document” references to the active base URI defined in the content (resolution step 1) via the xml:base attribute.
>>> 
>>> In my reading of the spec, there is no situation where the above examples would declare either bar or baz href as being a “same-document” reference for the URI used to retrieve the entity, because this is not the base URI in scope within the element ‘e’.
>> 
>> Does that mean you believe they are not present within the document 
>> currently loaded?   
>> 
>> Or that the document currently loaded is not identified by the URI from
>> which it was loaded?
> 
> Neither.
> 
> In the first instance, I have no idea if the identified fragment ‘#node1’ is actually present in the document or not.  The spec says it should be and that I can’t go looking for it elsewhere because it’s a fragment.

I think there is a slight slip in your paraphrase.  The spec does not say 
it “should” be in the current document.  The spec says that the reference 
is “defined as” pointing to a fragment in the current document.  


> 
> The only things I do know from the XML I wrote in both examples and the interpretation of both RFC 3986 (all sections) and the XML Base Recommendation is that:
> 
> 1) there is an abstract or physical resource identified by the sequence of characters "http://www.microsoft.com/bar.xml”
> 
> 2) the sequence of characters "http://www.microsoft.com/bar.xml” should be the base URI used when evaluating all relative URI references within the scope of the element e having an xml:base attribute value equal to this sequence of characters

I have no idea what operations you have in mind as the meaning of 
the term “evaluating”.  So on this you get a free pass; you could mean
absolutely anything, including assigning a Wine Spectator-style 
score to them.

If you mean that the base URI you mention is the base URI used
when resolving the relative reference, then yes, I don’t recall
anyone in the discussion suggesting anything different. 

> 
> 3) there i an abstract or physical resource identified by the sequence of characters "http://www.microsoft.com/bar.xml#node1”
> 
> 4) using the fragment rules defined by Section 3.5, the abstract or physical resource identified by the character sequence "http://www.microsoft.com/bar.xml#node1” is a secondary resource of the abstract or physical primary resource identified by the character sequence "http://www.microsoft.com/bar.xml”
> 
> 5) both the abstract or physical resource identified by the character sequence "http://www.microsoft.com/bar.xml” and the abstract or physical resource identified by the character sequence "http://www.microsoft.com/bar.xml#node1” pass the Same-Document Reference tests specified by Section 4.4 and referencing Section 5.1 using the base URI "http://www.microsoft.com/bar.xml” determined according to the precedence rules defined in Section 5.1
> 
> 6) As a result of #5, subsequent dereferences to either URI should not result in a new retrieval action for the abstract or physical resource having a base URI identified by the character sequence "http://www.microsoft.com/bar.xml”

Yes, so far.

> 
> Note that #6 may not be the behavior expected by the content author,

Yes; clearly there is a non-empty set of people who do not want both
inference 2 and inference 6 in your list, and also two non-empty sets of
people, some of whom think it obvious that inference 2 must take
precedence and some that inference 6 must take precedence, and who
are unprepared to entertain the notion that both are prescribed by the
spec and both thus apply.  

> but it is, to the best of my understanding, what is implied by the RFC.  

And mine.  (And, I think, others’ as well.)

It may be worth pointing out that if the current document has the document
URI http://example.com/doc.xml, then 6 seems to some readers to entail
the proposition that the reference in question (whose absolute form is 
http://www.microsoft.com/bar.xml#node1) identifies a fragment named
“node1” in the current document, and thus necessarily identifies
the fragment http://example.com/doc.xml#node1.  


> This also means that for the purposes of processing the document and resolving references to "http://www.microsoft.com/bar.xml” within the scope of element e, there is no way to trigger actual retrieval of this resource through link targets with this character sequence.  

I don’t know what you mean; if you mean that inferences 2 and 6 together
seem to suggest that software can choose either to dereference the 
refernce to #node1 by retrieving http://www.microsoft.com/bar.xml#node1
or by looking for node1 within http://example.com/doc.xml, and that the
author has in this case no reliable way of forcing one choice rather than 
another, then I agree.  (The author does have a reliable way, of course:
writing what the author means.)


> The only way to actually trigger retrieval of this resource would be to provide a reference to the "http://www.microsoft.com/bar.xml” URI outside the scope of element e.

Uh, I think there is a slip here.  Being outside element e surely doesn’t
guarantee that the base URI is not http://www.microsoft.com/bar.xml;
if we are in an HTML document, the ‘base’ element may specify that as
the base URI; if we are in an XML document, there may be another
xml:base attribute.  But with that trivial proviso, yes, I think you are correct
here.

> 
> What is present in the “document current loaded”, e.g. the octet stream (we don’t have the metadata),

I’m not sure why we don’t have “the” metadata.  

> is a set of identifiers which, by defining them within the content of that octet stream, gives them – the identifiers, and, by indirection the resources they identify – form and meaning in the universe because we have identified them, and we’re referencing them within the octet stream itself.
> 
> What is NOT present in the “document currently loaded”/octet stream is the corresponding octet stream(s) and meta-data that may result from any such retrieval action performed by dereferencing those identifiers independent of the same-document tests and I believe referencing one of Michael’s original concerns.
> 
> Due to Section 4.4 and the criteria for “same-document”, the referenced #node1 target is expected to appear somewhere as a child of element e, but, again, due to external factors and potential commissions by the content author, it may not, in fact, be present.

> 
> Stated more simply, I may talk about a nice, juicy, tart and perfectly crisp apple within the context of this email, but that doesn’t mean either of us get to eat it.  We only have an identifier for it.  We don’t have the result of retrieving that identifier, and, due to practical issues relating to embedding fruit as attachments in email, said apple does not appear in the text of the email itself.  Further, if we had a base URI for this email and defined the reference to the apple in terms of it, we’d be denied access to the apple forever because we “should not” make additional attempts to find it by resolving the reference.
> 
> In the second instance, there is no question that if any of the example documents I created were the result of dereferencing the character sequence "http://www.saxonica.com/foo.xml"; for a retrieval action, then those octet sequences and metadata would be a physical resource identified by the character sequence "http://www.saxonica.com/foo.xml.”
> 
> The question still remains, apparently, what is the correct character sequence that should be the value of the base URI within the scope of the element e.

?!  

> 
> However, I don’t think this question is ambiguous or hard to answer if you fully consider all of the text within the RFC and the XML Base Recommendation and connect the dots according to the rules defined therein.
> 
> The only possible value for the base URI used within the scope of the element e MUST be "http://www.microsoft.com/bar.xml” because the specs say so. Not me, not you, not Roy, not God, not Allah, but just those two documents, working together in the way that they were designed to allow future extensibility and still define a coherent architecture in the cases where that extension point was not required or used within the given resource being processed.

This peroration suggests that you believe you are arguing against 
people who deny that the relevant base URI is the one you mention
or who deny that the absolute form of the reference is the one you
mention.  I think if you reread carefully the writings of your interlocutors
you will find that this is not true.  

As for Roy Fielding, I asked a question about your interpretation of
RFC 3986, trying to understand what position you are trying to take.
If you choose to misunderstand my question, I don’t suppose I can
force you to answer.


> 
> I fail to see how RTF has a different interpretation in the exact case I described than I do.  It’s a pretty clear, one-word answer.
> 
> The rest of his answer is commentary on why this is true. What he says doesn’t change the fact that if a base URI is defined within the content of a resource, then relative references within a given resource within the scope of that base URI definition must be interpretative as relative to the defined base URI.

No.  The rest of his answer includes an explanation of why the same-document
rules allow the fragment in question to be retrieved from the current URI,
as described above in your inference number 6.  That was what I understood
you to be denying.  


> 
> The only material difference between the case cited from 2004 and the examples I gave was the syntax used for specifying the desired base URI within the content.  In the case of the 2004 question, it was HTML’s BASE tag, and in the case of our discussion, it’s the xml:base attribute.

No.  In the 2004 question, Paul Grosso provided both HTML base and xml:base examples.


> 
> At this point, I also feel that I’ve illustrated more than once how the two specifications work together to clearly identify the value of a base URI that should be applied given knowledge of the content vocabulary and the value of any relative URI.

Yes, I think you have.  And MIchael Kay.  And a good many other people.

Since you began by suggesting that RFC 3986 was irrelevant to the case,
I had then the impression that you thought what it said was of no concern
in the example in question.

I’m happy to learn that I misunderstood your position.  



********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cmsmcq@blackmesatech.com
http://www.blackmesatech.com
********************************************

Follow-Ups:
- Re: [xml-dev] xml:base and fragments
  - From: "Andrew S. Townley" <ast@atownley.org>

References:
- xml:base and fragments
  - From: "John P. McCaskey" <mailbox@johnmccaskey.com>
- Re: [xml-dev] xml:base and fragments
  - From: Eliot Kimber <ekimber@contrext.com>
- Re: [xml-dev] xml:base and fragments
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Re: [xml-dev] xml:base and fragments
  - From: "Andrew S. Townley" <ast@atownley.org>
- Re: [xml-dev] xml:base and fragments
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Re: [xml-dev] xml:base and fragments
  - From: "Andrew S. Townley" <ast@atownley.org>
- Re: [xml-dev] xml:base and fragments
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Re: [xml-dev] xml:base and fragments
  - From: "Andrew S. Townley" <ast@atownley.org>
- Re: [xml-dev] xml:base and fragments
  - From: Michael Kay <mike@saxonica.com>
- Re: [xml-dev] xml:base and fragments
  - From: "Andrew S. Townley" <ast@atownley.org>
- Re: [xml-dev] xml:base and fragments
  - From: Michael Kay <mike@saxonica.com>
- Re: [xml-dev] xml:base and fragments
  - From: "Andrew S. Townley" <ast@atownley.org>
- Re: [xml-dev] xml:base and fragments
  - From: Michael Kay <mike@saxonica.com>
- Re: [xml-dev] xml:base and fragments
  - From: "Andrew S. Townley" <ast@atownley.org>
- Re: [xml-dev] xml:base and fragments
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Re: [xml-dev] xml:base and fragments
  - From: "Andrew S. Townley" <ast@atownley.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]