OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] XPointer and XML Schema

[ Lists Home | Date Index | Thread Index ]

On Thu, 2002-07-18 at 15:57, Henry S. Thompson wrote:

> Sorry not to be clearer, let me try to be as precise as I can.
>    *source*---An XML document containing an remote absolute http-scheme
>    URI reference (call this *ref*) which includes a (shortform) fragment
>    identifier XYZZY (call this the *idref*)

There may be cases where there is no XML source then (the reference may
be located in a HTML document or entered in the URL entry area of a

>    *user agent*---The application/machine which issues the GET request for
>    *ref*
>    *server*---The application handling the GET request at the machine
>    identified by domain name part of *ref*
>    *target document*---The XML (i.e. the *server* believes it is of
>    mime type text/xml or application/xml or . . ., given any accept
>    header parameterisations sent along with the GET for *ref*)
>    document identified by *ref*, ignoring the fragment identifier part
>    thereof, as returned by the *server* in reply to the GET request
>    for *ref*
>    *TDI*---The representation of the infoset of the *target document*
>    constructed by the *user agent*
>    *intended target*---The element information item in *TDI* intended
>    by the author of *source* as the referent of *ref* (including *idref*)
>    *actual target*---The element information item in *TDI* identified
>    by the *user agent* as the referent of *ref* by interpreting
>    *idref* as a shortform xpointer
>    *supplementary resources*---Resources involved in the construction
>    by the *user agent* of the *TDI*.  These may be indentified by
>    absolute or relative URI references.  Other things being equal,
>    *ref* will serve as the base URI for relative URI refs.  What these
>    are depends on the *target document* (obviously), the *user
>    agent*'s choice of processing done to construct the *TDI* --
>    minimal non-validating parsing, full validating parsing, complete
>    non-validating parsing (i.e. processes all referenced parameter
>    entities parsing) plus-or-not schema validity assessment, and the
>    environment in which the *user agent* operates.
> So my basic argument is that since what counts as an ID, and therefor
> what determines the *actual target*, depends crucially on the
> *supplementary resources*, and therefor on the *user agent* and its
> environment, that is user parameterisation/policy specifications,
> catalogs, caches, proxies, etc.,  the *source* and *target document*
> necessarily underdetermine the *actual target*.

I think I get the picture, and I'd just like to illustrate on simple
example with my own words to make it sure I get it right :-)

So, the user agent sends a request for the target document to the

If the headers of the answer from the server identifies the target
document as being XML the user agent knows that the fragment is using a
XPointer syntax.

When this is the case, and when the syntax is shortand, the user agent
knows that it must find the element identified by the "bare name" and
may need to construct a PSVI for this if it doesn't find the information
in the XML infoset of the document.

If the user agent has been instructed to use an alternative schema or if
the user agent has been programed or configure to use an alternative
schema, it will use this alternative schema to build the PSVI.

Otherwise it will look for a schema location in the target document.

If there is no schema location in the target document, it may use any
kind of black magic to find a schema (dereferencing the namespace URI
expecting to find a schema or a RDDL document or using any kind of
directory mechanism).

If it still has no schema it may raise an error or warning. Otherwise,
it will build the PSVI and try to match the bare name on the id table. 

> <skip/>
> > Not really. When I say that I want to access to anchor "boo" per the
> > (X)HTML naming system, the rules are set by the server.
> Um, you just went to some lengths to argue it was the *user agent*,
> not the *server*, which interprets fragIDs -- why change now?  The
> only thing the *server* contributes are the resource as such and its
> mime type.

You caught me :-) I meant that the semantic of the fragment id are set
by the DTD which is on *a* server but your terminology is far better.
> > > The _user_ does that by setting up the processing environment, in
> > > either case.
> > 
> > What do you mean?
> I hope the clarifications above now make this clear.  *User agents*
> typically enable a wide range of user control over their behaviour,
> and questions such as whether or not to validate, whether or not to
> chase parameter entity references, whether or not to use a proxy, may
> all be under user control.  The proxy point is particularly important
> -- if I am running without network access, the presence of absence of
> a *supplementary resource* such as a DTD in my cache may well
> determine whether my reference goes through or not.
> So, bottom line: should we _also_ consider providing some _author_
> input into the control of *supplementary resource* determination?  If
> so, where should it go and whose (i.e. which W3C REC's) job is it to
> say how this works?
> My answer: Yes, but not in the fragId and it's not the XPointer REC's
> job.  These questions are clearly the responsibility of XML Processing
> Model REC (forthcoming, I hope), in my opinion.  Note of course there
> are typically at least _two_ authors involved, which is another reason
> why putting it in the fragID is a bad idea.

My point was not to say to which spec that belongs but to make sure the
purpose was really to leave the behavior undefined to a certain point!
> Final note:  the 99.99% case, for both DTDs and Schemas, is that all
> sensible *user agents* will do the same thing, and it will be what
> people expect, namely:
>   1a) If there's a DOCTYPE, process as much of it as you can get
>       access to looking for ID declarations, and use them during
>       parsing to identify possible anchors;
>   2a) If there's an xsi:schemaLocation attribute, use it to get a
>       schema doc and schema validity assess using it;
>   2b) Otherwise if the doc elt is in a namespace and there's a
>       schema doc accessible via the namespace URI, ditto.
> People will chose their *user agents* just as they do now, namely on a
> combination of ubiquity and functionality.  Let's hope the market
> decides XPointer functionality is useful and we get *user agents* that
> do all three of the above.

I do not completely buy this argument. Going to the extreme in this
direction, you could say that people will chose their user agent by
their features and there is no need to define a syntax for HTML...
Leaving too much latitude to the implementers and vendors of user agents
isn't always such a great idea IMO...

Thanks for these clarifications!


See you in San Diego.
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS