OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SAX 2.0 enhancement proposal



> From: "Norman Walsh" <ndw@nwalsh.com>
> / David Brownell <david-b@pacbell.net> was heard to say:
> | Once the relative URI is interpreted relative to that location,
> | it's open season -- map the ID to /dev/null, to NUL:, to an error,
> | to a local cache, to whatever.  But that's AFTER it's been
> | interpreted relative to the location of the resource etc.
> 
> Then you didn't respond to the most relevant part of my original
> reply. 

Too long to quote in full ... I did respond to what I saw as
the most relevant part:

* > If the system identifier that the parser finally winds up using is a
* > relative URI reference, it's clear that it's relative to the base URI
* > of the containing entity. As I said before, I don't think that's in
* > dispute.
* 
* But the purpose of that proposed change is exactly to dispute that ... :)
* It's to enable interpretation with respect to something else.  If it weren't,
* then no change would be necessary.  QED.

If you are thinking "early vs late" is the key issue, I can't see "late" as
being conformant with the XML spec language:  in this case, it'd be
done only to support a nonconformant relative URI interpretation.


>    Why is it the case that an external resolver must only see the
> absolute URI?  I don't understand from where you draw the conclusion
> that early normalization is a requirement.

Terminology may be an issue.  I'm used to "normalization" relating
to character level manipulation (as in section 6.2 of that catalog draft,
which recaps standard rules for "% escaping"), and actually ignored the
issue of when that's done in a SAX environment. [1]

I'm also used to "resolution" being the process of mapping an identifier
(URI, say) to some concrete data that can be "read" (from a location),
though since the term is overloaded (== confusing when you have multiple
contexts) it's good to avoid using the term.  Processes involved in such
resolution can include redirection/remapping (as for example from some
"http://..." URL, as described in erratum E3) and other forms of substitution.
EntityResolver handles that process just fine.

The XML spec first uses the term "resolve" in 4.7 to say that a processor
"may additionally resolve..." external IDs of notations, and that usage seems
to be consistent with what I described above.

In this discussion I've also tried to use the word "interpretation" to refer
to the process of combining a base URI and a relative URI to produce
another URI -- without resolving either one.  Purely text manipulation,
as described in RFC 2396 ... sections 5.1.2 or 5.1.3 in this case, since
neither xml:base nor any other in-document embedding can apply, and
section 5.2 (which demonstrates that the term "resolve" is overloaded;
that RFC doesn't actually address the "fetch data" part of resolution, so
it felt free to use that sense the term).

Given that the XML 1.0 spec only says (repeating :) that "relative URIs are
relative to the location of the resource within which the entity declaration
occurs", I don't see that there's any scope for interpretation about exactly
which abolute URI is being identified by a SYSTEM identifier for any
parsed entity.  They identify exactly one absolute URI ... which can then
be resolved (including redirection/remapping/...) at will.

- Dave

[1] I think that most SAX parsers don't do normalization except as
    part of actual data fetching, in the sense that requests to web servers
    must provide legal URIs (%-escaped), while ones going to filesystems
    generally need to get rid of %-escapes, so there's no benefit to
    normalizing any earlier (and there is a visible cost, not restricted to
    the proliferation of strange looking strings).

    But that might be worth debating, in a separate thread; the E78
    erratum text is unclear about whether that normalization is to be
    done by the document generator or something else.