OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] URIs harmful (was RE: [xml-dev] Article: Keeping pace with

[ Lists Home | Date Index | Thread Index ]

Hmmm ...

On Fri, 2002-07-19 at 22:03, Uche Ogbuji wrote:
> Simon wrote:
> > Namespaces are probably the worst place where this pollyanna attitude
> > has smacked XML, but their progeny, QNames, offer their own set of
> > problems.

[snip]
> The harm of URIs is rather well contained when we apply to them the same 
> attitude the loosely-coupled clique applies to XML itself.  Let each person 
> use them as he pleases and don't try any overarching design of URIs.  The key 
> is in loose coupling between signifier and signified, and between the agent 
> granting the name and the agent using the name.  Tight coupling between 
> signifier and signified is one of my quarrels with Topic Maps.  Tight coupling 
> between the granter and the receiver of the name is one of the reasons I'd 
> rather the W3C and others didn't address URI issues by fiat, even to squash 
> 3000-message threads.

Err, yes and no, I think.

Sending "URL" off to "URI-land," in which nothing can be known about
what's inside the box, leads to unpleasant results not only due to the
confusion over dereferencing, but due to the change in semantic.

A URI used as a namespace is officially a string, and you can only do
string comparisons.  Case is significant.

In most of the URL formats that follow "common internet format," case is
variably significant.  Since the hostname is defined via DNS, it isn't
case sensitive; it inherits that from DNS.  WWW.W3.ORG is *identical* to
www.w3.org, because the limited subset of permissible characters in DNS
defines it so; upper- and lower-case characters are identical.  Note
that, for most schemes, the scheme is a specific identifier
(case-sensitive), and the username and path portions, where they exist,
are also case-sensitive.  For SMTP addressing, username is officially
case-sensitive (but is often resolved in case-insensitive fashion in the
field ... but that's outside the spec, let's stay inside).

There's further confusion, because, according to DNS, www.w3.org and
www.w3c.org are the same thing.  Or, on my local network (won't resolve
for any of you, sorry), www.talsever.com == ftp.talsever.com ==
ns2.talsever.com == xfs2.talsever.com == log.talsever.com ==
talifane.talsever.com.  Using any of these as part of a URL which will
be subjected to the resolution algorithm will result in certain things
coming up identical ... but string comparison won't.

Note that these are separate, but related problems.  In the case of
resolution, one might (as W3C seems to have done) rule out application
of a normalizing algorithm to the URI, even though each one identifies
its preferred algorithms as the initial element of the string
representation.  The widespread use of hostnames in the common internet
format for URLs, and W3C's recommendation that these are the preferred
form (because the publisher "owns" the namespace by virtue of owning the
domain), makes the failure to recognize and handle the rules of
normalization for DNS less than entirely compelling.  More or less the
same is true of encoding issues, whether they are url-encoded or
quoted-printable encoded.

The namespaces rec specifically states that URI reference identity
requires character-by-character identity, and it appears that there has
been discussion within the TAG about the potential difficulties of doing
anything more complex.  There is clearly a great deal of complexity ...
but it gets easier and easier to challenge the claim that "this is a URI
reference" the further that the namespace string's semantic drifts from
the semantic of a URI.  A namespace name, in fact, is a thing that has
URI syntax.  Only.  It isn't a URI, or a URI reference, it is a
namespace name, which is defined to have URI syntax.  If I happen to
have a URL object off over here that's intended for use (location of a
resource, that is), it just isn't safe for me to compare its string form
with a namespace name.

Which is to say, I don't think it's really an issue of coupling, but an
issue of ambiguity, as Simon (and Len) originally suggested.  Using a
form (syntax) that carries extremely heavy connotations of an associated
semantic, and violating that semantic (here I'm not speaking of the
location algorithm, but of case-sensitivity, encoding, and resolution
only, mind), is just guaranteed to produce confusion.  Witness the
3000-message thread that Just Won't Die (and TBL reopened it with a
suggestion that "relative URIs", an utterly *meaningless* concept when
namespace names have been divorced from URI semantic (say "relative
string" and "absolute string" and see what meaning you can discover),
are not all that bad after all ... *sigh*).

Amy!
(also writing email at an hour when she should be snoozing ... if only
it would *rain* and drop the temperature into the range of bearable ...)
-- 
Amelia A. Lewis       amyzing@talsever.com      alicorn@mindspring.com
What's the end of a story?  When you begin telling it.
                -- Ursula K. Le Guin

This is a digitally signed message part





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS