XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] xml:base and fragments

> On May 10, 2017, at 12:31 AM, Michael Kay <mike@saxonica.com> wrote:
> 
>> 
>> If you’ve previously retrieved string1 as the base URI, then what the paragraph actually says was that you shouldn’t need to trigger a new retrieval of string2 because both are equal and refer to the same resource content that you have already loaded.
>> 
>> If you haven’t actually retrieved string1 as the base URI *and* you’re dereferencing the target as part of a retrieval action, then you have no choice but to trigger a new retrieval action (because you don’t have it yet).
>> 
>> What I still didn’t understand was your:
>> 
>>>> * the reference is interpreted as a reference to the entity containing the reference, even if that entity is completely unrelated to anything you might find by retrieving the resource at http://A/. 
>> 
>> Because it must be related if you’ve already loaded it, and it by definition is a "same-document” reference.  How could it be completely unrelated?
>> 
>> 
> 
> Let's say a document located at http://www.saxonica.com/foo.xml contains (a) the attribute xml:base="http://www.microsoft.com/bar.xml";, and (b) the attribute href="". Resolving @href against the base URI gives "http://www.microsoft.com/bar.xml";. The expanded value of href is the same as the base URI, so the spec says this is a same-document reference, so under section 4.4 the @href is assumed to refer to http://www.saxonica.com/foo.xml, not to http://www.microsoft.com/bar.xml. In other words, xml:base is effectively ignored - technically it's used to expand the relative reference, but the document you get back isn't the document at that location.

That’s kinda what I thought you meant, and, unfortunately, I don’t see how this aligns with Section 4.4 without a more precise scoping of the xml:base and href relationships.

The way I read the definition of “same-document” means that it’s relative to the currently appropriate base URI, so even Section 4.4 has to follow the scoping rules such that if you had something like:

<e xml:base="http://www.microsoft.com/bar.xml”>
  <bar href=“”/>
</e>

According to Section 4.4 and Section 5.1, this empty href is a “same document” reference to the active base URI, in this case http://www.microsoft.com/bar.xml, and NOT the URI used to retrieve the entity (5.1.3) and step 3 of the base URI resolution.

More specifically, if you expanded the above example:

<e xml:base="http://www.microsoft.com/bar.xml”>
  <bar href=“”/>
  <baz href=“#node1”/>
</e>

According to the RFC, the bar href must expand to "http://www.microsoft.com/bar.xml” and the baz href must expand to "http://www.microsoft.com/bar.xml#node1” making both bar and baz href “same-document” references to the active base URI defined in the content (resolution step 1) via the xml:base attribute.

In my reading of the spec, there is no situation where the above examples would declare either bar or baz href as being a “same-document” reference for the URI used to retrieve the entity, because this is not the base URI in scope within the element ‘e’.

In this case:

<root>
  <e xml:base="http://www.microsoft.com/bar.xml”>
    <bar href=“”/>
    <baz href=“#node1”/>
  </e>
  <bar href=“”/>
</root>

The 3 absolute URIs created according to the RFC and XML Base should be:

http://www.microsoft.com/bar.xml
http://www.microsoft.com/bar.xml#node1
http://www.saxonica.com/foo.xml

Varying the example somewhat:

<root xml:base="http://www.microsoft.com/bar.xml”>
  <e>
    <bar href=“”/>
    <baz href=“#node1”/>
  </e>
  <bar href=“”/>
</root>

You still don’t have an issue here because the content-defined base URI is the root of the XML document and applies “globally”, e.g.:

http://www.microsoft.com/bar.xml
http://www.microsoft.com/bar.xml#node1
http://www.microsoft.com/bar.xml

You never should have a situation where http://www.saxonica.com/foo.xml is chosen as the base URI according to the RFC at all in the last example.

> 
>> 
>> What is your concern behind the observation?  Have you seen this behavior in the wild?
>> 
> 
> I have spent countless hours trying to decide what RFC 3986 and related specs mean, in order to ensure that specifications like XSLT and XPath are consistent with the RFC, and that my software products are conformant. My concerns are that (a) it's often very difficult to decide what they mean even after close study, and (b) sometimes when you work out what they mean, it's the last thing the user would expect.

I understand that, I hear you, and from watching your participation in this list since around 2005, I know it’s true.

I’ve done the same thing for my own software, and you’re absolutely right.  Sometimes it’s exceptionally difficult to determine what was meant.

User expectations? You mean we’re supposed to account for those as well??? ;)

As I alluded to before, in the past, I have also spent many hours pouring through these and related specs for the same reasons as yourself.  I wear a slightly different hat these days, so I don’t spend as much time in RFC’s and other specifications as I used to, however I do write one or two on a regular basis that are of a similar vein to XPath and SQL in some ways, and I’ve written quite a few of my own public and private ones in the past.

A canonical suite of test cases is the best solution I’ve come up with to judge spec conformance. But that only works in conjunction with a process for adding new ones after some kind of multi-brain discussion to determine/remember what was meant when things are vague.

In the cases of RFC 3986 and XML Base, we only have smart people and their command of the English language as the determining factors, and I’m glad we’re able to have these conversations in such a forum—regardless of which perspective proves to be correct. :)

Cheers,

ast
--
Andrew S. Townley <ast@atownley.org>
http://atownley.org



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS