[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: SAX 2.0 enhancement proposal
- From: David Brownell <david-b@pacbell.net>
- To: xml-dev@lists.xml.org
- Date: Fri, 15 Jun 2001 08:39:17 -0700
> From: "Norman Walsh" <ndw@nwalsh.com>
> / David Brownell <david-b@pacbell.net> was heard to say:
> | Once the relative URI is interpreted relative to that location,
> | it's open season -- map the ID to /dev/null, to NUL:, to an error,
> | to a local cache, to whatever. But that's AFTER it's been
> | interpreted relative to the location of the resource etc.
>
> Then you didn't respond to the most relevant part of my original
> reply.
Too long to quote in full ... I did respond to what I saw as
the most relevant part:
* > If the system identifier that the parser finally winds up using is a
* > relative URI reference, it's clear that it's relative to the base URI
* > of the containing entity. As I said before, I don't think that's in
* > dispute.
*
* But the purpose of that proposed change is exactly to dispute that ... :)
* It's to enable interpretation with respect to something else. If it weren't,
* then no change would be necessary. QED.
If you are thinking "early vs late" is the key issue, I can't see "late" as
being conformant with the XML spec language: in this case, it'd be
done only to support a nonconformant relative URI interpretation.
> Why is it the case that an external resolver must only see the
> absolute URI? I don't understand from where you draw the conclusion
> that early normalization is a requirement.
Terminology may be an issue. I'm used to "normalization" relating
to character level manipulation (as in section 6.2 of that catalog draft,
which recaps standard rules for "% escaping"), and actually ignored the
issue of when that's done in a SAX environment. [1]
I'm also used to "resolution" being the process of mapping an identifier
(URI, say) to some concrete data that can be "read" (from a location),
though since the term is overloaded (== confusing when you have multiple
contexts) it's good to avoid using the term. Processes involved in such
resolution can include redirection/remapping (as for example from some
"http://..." URL, as described in erratum E3) and other forms of substitution.
EntityResolver handles that process just fine.
The XML spec first uses the term "resolve" in 4.7 to say that a processor
"may additionally resolve..." external IDs of notations, and that usage seems
to be consistent with what I described above.
In this discussion I've also tried to use the word "interpretation" to refer
to the process of combining a base URI and a relative URI to produce
another URI -- without resolving either one. Purely text manipulation,
as described in RFC 2396 ... sections 5.1.2 or 5.1.3 in this case, since
neither xml:base nor any other in-document embedding can apply, and
section 5.2 (which demonstrates that the term "resolve" is overloaded;
that RFC doesn't actually address the "fetch data" part of resolution, so
it felt free to use that sense the term).
Given that the XML 1.0 spec only says (repeating :) that "relative URIs are
relative to the location of the resource within which the entity declaration
occurs", I don't see that there's any scope for interpretation about exactly
which abolute URI is being identified by a SYSTEM identifier for any
parsed entity. They identify exactly one absolute URI ... which can then
be resolved (including redirection/remapping/...) at will.
- Dave
[1] I think that most SAX parsers don't do normalization except as
part of actual data fetching, in the sense that requests to web servers
must provide legal URIs (%-escaped), while ones going to filesystems
generally need to get rid of %-escapes, so there's no benefit to
normalizing any earlier (and there is a visible cost, not restricted to
the proliferation of strange looking strings).
But that might be worth debating, in a separate thread; the E78
erratum text is unclear about whether that normalization is to be
done by the document generator or something else.