OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   URIs + Catalogs or URL+ HTTP Caching ? (was RE: [xml-dev] RDDL)

[ Lists Home | Date Index | Thread Index ]

Let's say we are in the context of designing a DOCTYPE-like system to find a
document that list all related meta-data available for a document. What is
the difference/advantages/disadvantages between having a public identifier
resolved by a catalog that can delegate to other catalogs, and an URL
identifier resolved by using HTTP with a caching mechanism ? In other words,
should we try to use the DOCTYPE PUBLIC identifier, in the way external
entities are resolved, or try to use an URL passed in a PI or in a special
attribute to get our meta-data directory, in the way XML Schema for a
document are resolved ?

It seems to me that the URL way has those advantages :
- less centralised. If I want to deploy a new meta-data directory, I just
have to make it available on the web at a given URL, and write this URL into
each document that need to reference it. Doing the same with catalogs would
imply the existence of root catalogs, in a DNS style. We would reproduce the
DNS infrastructure instead of leveraging it.
- up to date data : the latest version of the meta-data is always available
at the given URL, provided by its author. You don't have to update lots and
lots of catalogs.
- better performance : HTTP caching can be implemented with software or
hardware dedicated caches (it can even be done transparently, see for
example (see for example http://www.squid-cache.org/) to save some requests
from going on the Internet. It can also be implemented at the client
software level (see for example
http://www.alphaworks.ibm.com/tech/urlcache), the cached metadata being
stored in its memory. As a given program will often use a limited set of
document types, the client-side cache will be highly effective. Plus,
caching systems know how to handle stale data, so there is no problem with
keeping the data up to date.

It has this disadvantage :
- introduces a point of failure. If for any reason there is no connection to
the internet available, and there is no copy of the data in any cache, then
we're stuck. Maybe this could be solved by marking a core set of URLs has
mandatory entries in the cache, never to be cleared (but still updatable),
though it would require a special caching system (no more using the standard
HTTP caches on the market). Example of such core URLs would be the meta-data
for most common schema languages, for XHTML, and other at the convenience of
the cache administrator and users.

The list of advantages / disadvantages is open...


----- Original Message -----
From: "John Cowan" <jcowan@reutershealth.com>
To: "Nicolas Lehuen" <nicolas.lehuen@ubicco.com>
Cc: <xml-dev@lists.xml.org>
Sent: Friday, January 18, 2002 7:47 PM
Subject: Re: TR: [xml-dev] RDDL (was RE: [xml-dev] Negotiate Out The Noise )

> Nicolas Lehuen wrote:
> > Are those catalog DNS like ?
> They could be, though there is currently nothing analogous to the DNS
> root servers.  Catalogs can be local or remote, and local catalogs
> can delegate to remote ones, since catalogs are referred to by
> URIs.
> --
> Not to perambulate             || John Cowan <jcowan@reutershealth.com>
>     the corridors               || http://www.reutershealth.com
> during the hours of repose     || http://www.ccil.org/~cowan
>     in the boots of ascension.  \\ Sign in Austrian ski-resort hotel


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS