Re: [xml-dev] xml:base and fragments

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
To: Eliot Kimber <ekimber@contrext.com>
Date: Thu, 4 May 2017 10:21:59 -0600

> On May 4, 2017, at 7:07 AM, Eliot Kimber <ekimber@contrext.com> wrote:
> 
> The base URI of the ref element is “http://www.dictionary.com/a.html” therefore the fragment identifier *MUST* refer to an element within a.html.
> ...
>  
> From: "John P. McCaskey" <mailbox@johnmccaskey.com>
> Reply-To: <mailbox@johnmccaskey.com>
> Date: Thursday, May 4, 2017 at 7:53 AM
> To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
> Subject: [xml-dev] xml:base and fragments
>  
> A (somewhat Talmudic) discussion on the TEI mailing list (https://listserv.brown.edu/archives/cgi-bin/wa?A0=tei-l) regarding xml:base has come to a conclusion I want to run by XML-DEV readers.
> 
> TEI defines some XML attributes as RFC 3986-compliant URIs that fully honor xml:base.
> 
> The question is whether a value in this attribute of the form #fragment refers to an XML element in the document that contains the attribute or to a location specified by xml:base.
> 
> The (almost) consensus is that to resolve such a standalone fragment, xml:base values should be ignored.

As one of the exegetes involved in the Talmudic discussion in question, I hasten to note that I have not
seen anyone suggest that xml:base values be ignored in resolving a relative reference of the form #fragment.
I don’t know where that characterization comes from, but it’s not a correct paraphrase of any position I have 
seen taken in the discussion.

> 
> In this excerpt from example.xml
> 
>   <div xml:base="http://www.dictionary.com/a.html";>
>     <p>
>       <ref target="#apple">Apple</ref>
>     </p>
>   </div>
> does #apple refer to an element in example.xml that has id="apple" or to http://www.dictionary.com/a.html#apple?
> 
> The first, right?

The verb “refer to” is not a technical term defined by RFC 3986, and it
is tightly defined neither in ordinary language nor (as far as I can tell) in
philosophy of language (that is, individual philosophers of language often
provide sharp definitions of the term, but not necessarily the same or
equivalent definitions).  So it’s not clear that the question as asked has
any definitie answer.

I think the question can be paraphrased using defined terms as “Does
‘#apple’ identify the resource 

(1)  …/example.xml#apple

(where … denotes whatever absolute context was present in the URI
used to retrieve the document in the first place) or the resource 

(2)  http://www.dictionary.com/a.html#apple

?

The answer, as I read RFC 3986, is “both”.  

If the creator of a document is not happy with that answer, then caution 
should be taken in the use of xml:base and fragment-only identifiers.

- We know that in the context given ‘#apple’ denotes URI (2), because
that’s the way xml:base and the rules for absolutizing URIs work.

- We also know that the reference to ‘#apple’ is a same-document 
reference, as that term is defined in RFC 3986, because the absolute
URI of the reference is (2), and the base URI is 

    http://www.dictionary.com/a.html

and the two are identical, aside from the fragment identifier.  And
that is how RFC 3986 defines same-document references.

- Of same-document URIs, RFC 3986 observes 

   When a same-document reference is dereferenced for a retrieval
   action, the target of that reference is defined to be within the same
   entity (representation, document, or message) as the reference;
   therefore, a dereference should not result in a new retrieval action.

- Since the target of the reference "is defined to be within” example.xml,
it seems that in context, ‘#apple’ identifies the element in example.xml
with xml:id=“apple”.  That is, (1).

This all makes some sense if one remembers that the initial purpose
of the HTML ‘base’ element, whose functionality xml:base is intended
to replicate, was to provide an in-document representation for the
document’s URI, for cases where that information is not available
from the context.  Neither HTML 4.01 or xml:base actually assign
the meaning “this is a URI for this document” to the construct, so
it’s not necessarily abusive to use it in other ways.  But the discussion
of same-document references in 3986 does have the effect of 
defining the targets of any URIs identical to the base URI (aside 
from the fragment component) as being inside the current document.

N.B. the discussion here applies even if the reference in question is 
absolute, not relative — it is not a question of ignoring xml:base when 
resolving ‘#apple’; the statements in 3986 apply equally to 
‘http://www.dictionary.com/a.html#apple’.

********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cmsmcq@blackmesatech.com
http://www.blackmesatech.com
********************************************

Follow-Ups:
- Re: [xml-dev] xml:base and fragments
  - From: "Andrew S. Townley" <ast@atownley.org>
- Re: [xml-dev] xml:base and fragments
  - From: Eliot Kimber <ekimber@contrext.com>

References:
- xml:base and fragments
  - From: "John P. McCaskey" <mailbox@johnmccaskey.com>
- Re: [xml-dev] xml:base and fragments
  - From: Eliot Kimber <ekimber@contrext.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]