XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] xml:base and fragments

Oh, oh… We’ve finally reached the “Roy T. Fielding” reference moment.  Haven’t been here for quite a while.

How exciting…

> On May 10, 2017, at 5:37 AM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
> 
> 
>> On May 9, 2017, at 5:00 PM, Andrew S. Townley <ast@atownley.org> wrote:
>> 
>> ...
>> 
>> The way I read the definition of “same-document” means that it’s relative to the currently appropriate base URI, so even Section 4.4 has to follow the scoping rules such that if you had something like:
>> 
>> <e xml:base="http://www.microsoft.com/bar.xml”>
>> <bar href=“”/>
>> </e>
>> 
>> According to Section 4.4 and Section 5.1, this empty href is a “same document” reference to the active base URI, in this case http://www.microsoft.com/bar.xml, and NOT the URI used to retrieve the entity (5.1.3) and step 3 of the base URI resolution.
>> 
>> More specifically, if you expanded the above example:
>> 
>> <e xml:base="http://www.microsoft.com/bar.xml”>
>> <bar href=“”/>
>> <baz href=“#node1”/>
>> </e>
>> 
>> According to the RFC, the bar href must expand to "http://www.microsoft.com/bar.xml” and the baz href must expand to "http://www.microsoft.com/bar.xml#node1” making both bar and baz href “same-document” references to the active base URI defined in the content (resolution step 1) via the xml:base attribute.
>> 
>> In my reading of the spec, there is no situation where the above examples would declare either bar or baz href as being a “same-document” reference for the URI used to retrieve the entity, because this is not the base URI in scope within the element ‘e’.
> 
> Does that mean you believe they are not present within the document 
> currently loaded?   
> 
> Or that the document currently loaded is not identified by the URI from
> which it was loaded?

Neither.

In the first instance, I have no idea if the identified fragment ‘#node1’ is actually present in the document or not.  The spec says it should be and that I can’t go looking for it elsewhere because it’s a fragment.

The only things I do know from the XML I wrote in both examples and the interpretation of both RFC 3986 (all sections) and the XML Base Recommendation is that:

1) there is an abstract or physical resource identified by the sequence of characters "http://www.microsoft.com/bar.xml”

2) the sequence of characters "http://www.microsoft.com/bar.xml” should be the base URI used when evaluating all relative URI references within the scope of the element e having an xml:base attribute value equal to this sequence of characters

3) there i an abstract or physical resource identified by the sequence of characters "http://www.microsoft.com/bar.xml#node1”

4) using the fragment rules defined by Section 3.5, the abstract or physical resource identified by the character sequence "http://www.microsoft.com/bar.xml#node1” is a secondary resource of the abstract or physical primary resource identified by the character sequence "http://www.microsoft.com/bar.xml”

5) both the abstract or physical resource identified by the character sequence "http://www.microsoft.com/bar.xml” and the abstract or physical resource identified by the character sequence "http://www.microsoft.com/bar.xml#node1” pass the Same-Document Reference tests specified by Section 4.4 and referencing Section 5.1 using the base URI "http://www.microsoft.com/bar.xml” determined according to the precedence rules defined in Section 5.1

6) As a result of #5, subsequent dereferences to either URI should not result in a new retrieval action for the abstract or physical resource having a base URI identified by the character sequence "http://www.microsoft.com/bar.xml”

Note that #6 may not be the behavior expected by the content author, but it is, to the best of my understanding, what is implied by the RFC.  This also means that for the purposes of processing the document and resolving references to "http://www.microsoft.com/bar.xml” within the scope of element e, there is no way to trigger actual retrieval of this resource through link targets with this character sequence.  The only way to actually trigger retrieval of this resource would be to provide a reference to the "http://www.microsoft.com/bar.xml” URI outside the scope of element e.

What is present in the “document current loaded”, e.g. the octet stream (we don’t have the metadata), is a set of identifiers which, by defining them within the content of that octet stream, gives them – the identifiers, and, by indirection the resources they identify – form and meaning in the universe because we have identified them, and we’re referencing them within the octet stream itself.

What is NOT present in the “document currently loaded”/octet stream is the corresponding octet stream(s) and meta-data that may result from any such retrieval action performed by dereferencing those identifiers independent of the same-document tests and I believe referencing one of Michael’s original concerns.

Due to Section 4.4 and the criteria for “same-document”, the referenced #node1 target is expected to appear somewhere as a child of element e, but, again, due to external factors and potential commissions by the content author, it may not, in fact, be present.

Stated more simply, I may talk about a nice, juicy, tart and perfectly crisp apple within the context of this email, but that doesn’t mean either of us get to eat it.  We only have an identifier for it.  We don’t have the result of retrieving that identifier, and, due to practical issues relating to embedding fruit as attachments in email, said apple does not appear in the text of the email itself.  Further, if we had a base URI for this email and defined the reference to the apple in terms of it, we’d be denied access to the apple forever because we “should not” make additional attempts to find it by resolving the reference.

In the second instance, there is no question that if any of the example documents I created were the result of dereferencing the character sequence "http://www.saxonica.com/foo.xml"; for a retrieval action, then those octet sequences and metadata would be a physical resource identified by the character sequence "http://www.saxonica.com/foo.xml.”

The question still remains, apparently, what is the correct character sequence that should be the value of the base URI within the scope of the element e.

However, I don’t think this question is ambiguous or hard to answer if you fully consider all of the text within the RFC and the XML Base Recommendation and connect the dots according to the rules defined therein.

The only possible value for the base URI used within the scope of the element e MUST be "http://www.microsoft.com/bar.xml” because the specs say so. Not me, not you, not Roy, not God, not Allah, but just those two documents, working together in the way that they were designed to allow future extensibility and still define a coherent architecture in the cases where that extension point was not required or used within the given resource being processed.

> 
>> 
>> In this case:
>> 
>> <root>
>> <e xml:base="http://www.microsoft.com/bar.xml”>
>>   <bar href=“”/>
>>   <baz href=“#node1”/>
>> </e>
>> <bar href=“”/>
>> </root>
>> 
>> The 3 absolute URIs created according to the RFC and XML Base should be:
>> 
>> http://www.microsoft.com/bar.xml
>> http://www.microsoft.com/bar.xml#node1
>> http://www.saxonica.com/foo.xml
>> 
>> Varying the example somewhat:
>> 
>> <root xml:base="http://www.microsoft.com/bar.xml”>
>> <e>
>>   <bar href=“”/>
>>   <baz href=“#node1”/>
>> </e>
>> <bar href=“”/>
>> </root>
>> 
>> You still don’t have an issue here because the content-defined base URI is the root of the XML document and applies “globally”, e.g.:
>> 
>> http://www.microsoft.com/bar.xml
>> http://www.microsoft.com/bar.xml#node1
>> http://www.microsoft.com/bar.xml
>> 
>> You never should have a situation where http://www.saxonica.com/foo.xml is chosen as the base URI according to the RFC at all in the last example.
> 
> So are you saying that Roy Fielding was wrong when he responded 
> in [1] to Paul Grosso’s inquiry in [2] by saying that in cases analogous
> to the one MK describes, the reference is a same-document reference
> and can and should be retrieved from the document that has 
> been already loaded (in MK’s case the document a www.saxonica.com)?
> 
> [1] https://lists.w3.org/Archives/Public/uri/2004Jan/0009.html
> [2] https://lists.w3.org/Archives/Public/uri/2004Jan/0007.html

From your link #1:

> From: Roy T. Fielding <fielding@gbiv.com> 
> Date: Thu, 29 Jan 2004 17:52:20 -0800
> Cc: uri@w3.org 
> To: Paul Grosso <pgrosso@arbortext.com> 
> Message-Id: <F11C7199-52C6-11D8-A4B3-000393753936@gbiv.com> 
> 
> >>> Fair enough.  So the special interpretation of "#foo" in the resource
> >>> denoted by "
> http://www.example.com/blargh
> " is extended to  
> >>> "blargh#foo"
> >>> and "
> http://www.example.com/blargh#foo
> " as well.
> >>>
> >>> But it seems to me that (for good or ill) this also means that if a
> >>> base URI is available, say "
> http://www.example.com/stat/blargh
> ", then
> >>> "#foo" now means "
> http://www.example.com/stat/blargh#foo
> ".
> >>>
> >>> Is this a correct reading of 2396 bis?
> 
> Yes.

I fail to see how RTF has a different interpretation in the exact case I described than I do.  It’s a pretty clear, one-word answer.

The rest of his answer is commentary on why this is true. What he says doesn’t change the fact that if a base URI is defined within the content of a resource, then relative references within a given resource within the scope of that base URI definition must be interpretative as relative to the defined base URI.

The only material difference between the case cited from 2004 and the examples I gave was the syntax used for specifying the desired base URI within the content.  In the case of the 2004 question, it was HTML’s BASE tag, and in the case of our discussion, it’s the xml:base attribute.

Otherwise, the result is the same.  Both mechanisms for specifying a base URI are only valid within the scope defined by the specification providing that mechanism.

With HTML, the scope of a BASE reference is the entire document, and, in the case of more than one being specified, the first encountered by the processor takes precedence [3].

With xml:base, the scope of a URI reference is within the implied scope of the element, the element’s parent or the document entity or external entity containing the element [4].

We’re seemingly at the point where we’ve increased the number of words we’re both reading, but the gap between our understanding is growing wider with the number of words read vs. the expected opposite effect.

At this point, I also feel that I’ve illustrated more than once how the two specifications work together to clearly identify the value of a base URI that should be applied given knowledge of the content vocabulary and the value of any relative URI.

I’m happy to keep going, but I’m pretty sure I’m going to still keep vamping on the same theme:  to me, there is no ambiguity, and, thanks to your reference, this also aligns well with the intent of one of the significant authors of RFC 3986.

Cheers,

ast

[3] https://developer.mozilla.org/en/docs/Web/HTML/Element/base
[4] Section 4.2, https://www.w3.org/TR/xmlbase/#granularity
--
Andrew S. Townley <ast@atownley.org>
http://atownley.org



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS