xml-dev - URIs again and again ....

URIs again and again ....
[ Lists Home | Date Index | Thread Index ]
To: xml-dev@lists.xml.org
Subject: URIs again and again ....
From: Amelia A Lewis <amyzing@talsever.com>
Date: 28 Jul 2002 16:14:32 -0400
The tautology "Uniform Resource Identifers uniformly identify resources"
is ... well, tautological.

What's a tautology?  A tautology is expressed in a tautological
statement.  What's a tautological statement?  It embodies a tautology.

That's kinda fun, you know.  Wanna go around again?  No?  That's okay, I
get motion sick too, sometimes.  Let's do something else instead.

Uniform. Resource. Identifier.

Okay, what makes them uniform?  Answer: each begins with an identifying
scheme.  This can always be recognized; look for the particle in front
of the colon.  Relative URIs can elide this portion; in that case,
though, context supplies the information that this is a URI, and also
supplies the "base" URI context.

What else is uniform?  Well, they're always identifiers, and always
resources, but saying that just makes me start to feel dizzy.  The
official ones are registered with the IANA.  Oh, and some of them,
typically ones that use scheme names that are identical to common
resource-access protocols used on the internet, share a format.  But
that isn't required, in general.  However, one more uniformity: when the
scheme is registered, a description of it is included.  That description
gives a number of rules for usage, and in some cases (usually for things
that are called Uniform Resource Locators), a standard algorithm is
supplied for retrieving the resource identified.

Oh, and that's a uniformity that's uniform among the uniform resource
thingies.  They identify.  Identification implies a degree of
uniqueness, if that's not oxymoronic.  That is, it says that the blobby
thinglet we're trying to describe here points at a particular
whatchamacallit.  As an identifier, it isn't the thing-in-itself, it's
pointing at something.  That something is probably unique.  More or
less.  A part of the description includes some rules for determining
when one of these pointer-dealies points at the "same thing" as some
other pointer-dealie ... that is, the description includes a means of
determining identity, in the more computer-sciencey sense.

The description also probably includes some rules for who gets to be the
authority for creating identifiers.  I *can* identify something with the
urn:isbn: sub-scheme, and use some number I just made up.  It's not a
proper URI, though, because I'm not following the rules laid down in the
description for that scheme, which delegate naming authority in the ISBN
namespace to the folks that already own the namespace (publishers, in
cooperation, in this case).

It isn't clear to me that making the distinction between URN and URL is
useful, any more, especially with the introduction of catalog-enabled
applications for handling URIs.  It might be better to think of URIs as
ranging along a spectrum from name to location (in the world beyond
computers, my given name is enough of an identifier for my friends;
given + surname enough for folks who know me and not some other Amy
Lewis; amyzing@talsever.com is sort of a name (it's the name of my
mailbox and the name of my domain), and sort of a locator (proper
software knows how to use it to send me stuff); my street address (no,
I'm not planning on publishing it here, thank you very much) is much
more a locator (and five years ago, or possibly five years from now,
that locator wouldn't locate me), but you could argue that "the person
living at this address" is a naming, not a location (junk mail folks
seem to think so; I'm reluctant, though, to change my name to "Resident"
just in order to feel more personally involved in the mail that I
receive)).

Where were we?  Off the carousel, obviously; I think that must have been
the fun-house.  Oh, yeah.

What's a resource then?  If someone says that a resource is something
identified by a URI, I'm going to invite them down here for the state
fair, feed them popcorn, cotton candy, and the nastiest greasiest food I
can find, then put them on the tilt-a-whirl.  Alone.  And bribe the
attendant, for good measure.

A resource may be a digital thing, or a physical thing (we could get
really zen, or possibly advanced-physicistical, and talk about the
inherent identity of matter and energy, but I don't think that'll get us
much furtherer, do you?), or something as nebulous as an idea, or an
algorithm, or an emotion.  Maybe we should say that a resource is
something that can be patented?  Heck, it would certainly resolve a lot
of arguments, and is almost certainly true in the united states.  Then
we could use the patent: scheme ....

Sorry.  I don't know *what* came over me.  Okay.  Uniformity, identity,
and a more nebulous resource-ness.  Are we relatively well defined?

I have a problem with the use of namespaces, not because the resources
can't be retrieved, but because the namespace specification changes the
rules.  You can argue, as some folks here seem to do, that the location
algorithm associated with a particular scheme should work for any URI
that truly conforms to the definition/description, and I tend toward
that direction myself, but it isn't an argument that I want to make.  I
*do* want to make an argument that, if you use a particular scheme, then
you abide by the rules for determining identity for that scheme in so
far as that is practical.

What that means, for the widespread http: scheme, is that you follow the
rules of the "common internet format."  That means that
http://www.w3.org:80/ is the same as http://www.W3.org/ is the same as
http://www.w3.org/.  On the other hand, http://www.talsever.com/AmyZing/
is not the same as http://www.talsever.com/amyzing/ (and neither of them
identify anything useful to you, since the domain only resolves usefully
for anything apart from email behind my firewall, so resist the
temptation to click on the nice blue linkies).  The rules of the common
internet format say that each scheme has a default port, which you can
leave out or put in without changing identity, and each has a
fully-qualified domain name, which is case insensitive because of the
rules of DNS, and each has a path, which *is* case sensitive.

For the http: URI scheme, it might be instructive to look at the
following excerpt from RFC 2616:

3.2.2 http URL

   The "http" scheme is used to locate network resources via the HTTP
   protocol. This section defines the scheme-specific syntax and
   semantics for http URLs.

   http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]

   If the port is empty or not given, port 80 is assumed. The semantics
   are that the identified resource is located at the server listening
   for TCP connections on that port of that host, and the Request-URI
   for the resource is abs_path (section 5.1.2). The use of IP addresses
   in URLs SHOULD be avoided whenever possible (see RFC 1900 [24]). If
   the abs_path is not present in the URL, it MUST be given as "/" when
   used as a Request-URI for a resource (section 5.1.2). If a proxy
   receives a host name which is not a fully qualified domain name, it
   MAY add its domain to the host name it received. If a proxy receives
   a fully qualified domain name, the proxy MUST NOT change the host
   name.

3.2.3 URI Comparison

   When comparing two URIs to decide if they match or not, a client
   SHOULD use a case-sensitive octet-by-octet comparison of the entire
   URIs, with these exceptions:

      - A port that is empty or not given is equivalent to the default
        port for that URI-reference;

        - Comparisons of host names MUST be case-insensitive;

        - Comparisons of scheme names MUST be case-insensitive;

        - An empty abs_path is equivalent to an abs_path of "/".

   Characters other than those in the "reserved" and "unsafe" sets (see
   RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

   For example, the following three URIs are equivalent:

      http://abc.com:80/~smith/home.html
      http://ABC.com/%7Esmith/home.html
      http://ABC.com:/%7esmith/home.html

-30-

Isn't that nice?  All laid out, how to compare for identity, even though
it doesn't give any detail on how to "locate."  Still, it's found inside
the definition of using HTTP to retrieve resources, so we can probably
take it as implicit.  Or not.  The particular example given (I guess the
authors didn't know about the recommended usage of "example.com" and
"example.org" and the like, huh?) turns up something that might be a
404, might be a redirect, but in any event isn't terribly useful (well,
it *is* a redirect, but the redirect might result in a 404; the page is
absolutely full of script and flash and other fairly useless stuff, and
the word "oops" that appears there is, perhaps, even less useful than
the cryptic three digits, not to mention displayed as very dark gray on
black).  It might be said to be a resource, in several different senses,
but at least it is the case, formally and by prescription, that the
three examples given above are all the *same* doohickey.

But not if they're used as namespace names.

Amy!
-- 
Amelia A. Lewis       amyzing@talsever.com      alicorn@mindspring.com
Yankees are compelled by some mysterious force to imitate Southern 
accents and they're so damn dumb they don't know the difference beween
a Tennessee drawl and a Charleston clip.
                -- Rita Mae Brown, "Rubyfruit Jungle"
This is a digitally signed message part
Follow-Ups:
- RE: [xml-dev] URIs again and again ....
  - From: "Jimmy Cerra" <jc2astro@hotmail.com>
Prev by Date: Re: [xml-dev] URIs and Names on the Web
Next by Date: Re: [xml-dev] URIs and Names on the Web
Previous by thread: URIs and Names on the Web
Next by thread: RE: [xml-dev] URIs again and again ....
Index(es):
- Date
- Thread