Lists Home |
Date Index |
On Sat, 2002-07-20 at 15:19, Uche Ogbuji wrote:
> So if URLs are not really about location, and are not really about retrieval
> (or at least are not really meant to be about either), doesn't this reduce
> some of the supposed divide between URLs and URIs?
> Maybe the fact that I consider URLs an identifier is why I've never understood
> why they are able to launch 3000-message threads. I just don't see that
> magnitude of problem anywhere.
I can think of at least four algorithms to apply to most URIs/URLs:
All URIs share an absolutely minimal parse: by the definition of URI,
the first particle of each indicates the scheme. When the scheme is
identified, the rest of the URI can be passed off to a scheme-specific
part for further operations.
For URNs, "resolution" may not be meaningful, but it does have meaning
for URIs in the "common internet format"
[scheme]:[hi][user@][dns][:port][path], where scheme is the scheme name,
hi is the hierarchy identifier (indicating whether there is a path
part), user is a username, identifiable by the cookie, dns is a fully
qualified domain name, port is a TCP (only, apparently) port number,
identifiable by the colon, and the path, if present, is the rest of the
stuff. The data scheme isn't hierarchical, or in the common format;
mailto is common format, but not hierarchical; most of the common
protocols are accessible via common format, and are hierarchical. These
latter are also much the most commonly encountered, everywhere.
But all of the things that you can apply sensible behavior to are, for
the most part, URLs. Does it matter? It doesn't have to, I suppose,
but in practice, the scheme (and whether you recognize it or not) gives
instruction on how to complete parsing, how to resolve or canonicalize
the URL, and how to normalize or decode it. It also, by the by,
indicates the algorithm to use to retrieve/access/locate the resource
Straying back into namespace land, there are several criticisms to make
of the namespace spec. Folks could probably live with a URL that
doesn't point at a real resource, but it's a URL that also can't be
resolved, normalized, or parsed in any way.
This points up a larger problem, I think. In the original URI
specification, Uniform Resource Identifiers are defined to be the
superset of Uniform Resource Locators and Uniform Resource Numbers.
Common W3C, and eventually IETF usage, has instead been "URLs carry
location semantics; URIs don't, even when they look like URLs." There
was already a location-algorithm-free means of specifying an identifier:
urn. More urn sub-schemes would have had to be created, but even this:
would neatly remove objections to the gutting of URLs. And this would
have been much nicer, for namespaces:
The latter retains the ability to administratively impose uniqueness on
URIs, without being less readable than common (and connotatively
I'm still really irritated with the recent specs and RFCs that use the
term "URI" in preference to "URL," when it's perfectly clear that they
mean something that has location semantics. What's the value in
blurring the distinction between the two? Why should anyone be happy to
see suggestions that URLs be created 404 from birth? Why shouldn't
folks demand that, if a URI is being used for identification only, that
it not somehow indicate that fact within the URI? (Even something as
simple as a recommended faux hostname could do this--instead of using
"www.w3.org", we suggest that for namespaces one use "xmlns.w3.org" (a
machine that doesn't exist (I hope)), and for philosophical concepts,
use cloud-cuckoo.example.org, and for really just generic identifiers
and we don't know why we're creating them but URIs are cool, use
identifiers.example.com--it doesn't even have to be normative, just
Amelia A. Lewis firstname.lastname@example.org email@example.com
A hundred thousand lemmings can't be wrong.
This is a digitally signed message part