xml-dev - Re: [xml-dev] Note from the Troll

Re: [xml-dev] Note from the Troll

[ Lists Home | Date Index | Thread Index ]

To: Paul Prescod <paul@prescod.net>
Subject: Re: [xml-dev] Note from the Troll
From: "Alaric B. Snell" <alaric@alaric-snell.com>
Date: Wed, 30 Oct 2002 09:55:22 +0000
Cc: xml-dev@lists.xml.org
In-reply-to: <3DBEB84C.8060507@prescod.net>
References: <36C08A70-E9A2-11D6-9F26-0030657E2F34@mac.com> <20021028214806.0EA535542@calm.warhead.org.uk> <3DBEB84C.8060507@prescod.net>

On Tuesday 29 October 2002 16:33, Paul Prescod wrote:
> Alaric Snell wrote:
> >...
> >
> > But you can never escape the old problem of the two armies; if you send a
> > single request packet, never hear anything back, and retransmit for days
> > without ever hearing back, you still don't know if the remote server got
> > the packet or not.
>
> That's a brutal problem and it is a perfect example why it would be
> incredibly wasteful to write code designed for the local case as if it
> were designed for use over a network...which is what it sounds to me
> like you were proposing when you started talking about trappable SEGV
> signals. It simply doesn't make sense to write local code as if it were
> remote code nor is vice versa reasonable.

The same problem occurs with local code. You can execute a local procedure 
call that never returns, and not know if it's completed or not. Power fails 
sometimes :-)

This is why we have transactions and rollback and all that.

You don't need to always explicitly deal with this because often the default 
behaviour - roll over and die with an error message or a puff of smoke - is 
OK. But when you want to do things where that becomes a problem you need to 
use transactions, requiring bidirectional communication to agree on something 
getting done. There are still cases where two-phase commit or even comitting 
to a local disk can fail, but the protocols reduce the chances to one in 
trillions, and if that's a problem you can go up to N-phase commit...

But it's all the same to the application code. All of this is an issue for 
the systems software you're working on.

> > Programming languages that have exceptions don't need to hide the network
> > when doing RPC. The RPC calls can just throw an extra RemoteException if
> > there's a networking problem, and bob's your uncle; we're exposing the
> > potential unreliability of the network, and nobody every said that any
> > method call has to be particularly fast anyway so we've nothing to hide
> > in the latency department!
>
> The question isn't whether the reliability is exposed through the API.
> The question is whether the application and API are *architected* around
> the potential for failure and latency. If they are, then they will be
> too much of a hassle for local use. This is all documented carefully in
> "A Note On Distributed Computing".
>
> You can certainly build decent protocols on top of RPC -- but only by
> sacrificing the very thing that made RPC so nice: the fact that it makes
> remote procedure calls look like local ones! An HTTP defined on top of
> RPC would still be HTTP. But HTTP API calls do not look much like
> procedure calls.

Depends on your API! int result = uri.doPost ({name: "Alaric Snell"})...

Hava programs are architected around the potential for failure, they have 
exceptions. As for latency... well, local disks and awkwardly long 
computations or awkwardly large files or processors slower than the ones the 
software was written for are as bad a source of latency as networks, and the 
fact that it's not handled so well in non-distributed apps is something that 
I'd moan about anyway. Windows machines really do grind when I throw 
multi-gigabyte CSV files into Excel; I don't mind that it takes forever, but 
it'd be nice if I could still use the machine for other things while it's at 
it :-)

> But more to the point, HTTP is optimized for handling networking issues
> like lost messages and latency. It defines idempotent methods carefully
> (these help reliability). It defines cachable methods carefully (these
> help latency).

Yep. But that's not something orthogonal to RPC. There aren't enough RPC 
systems that deal with these issues well, certainly, but there's nothing 
stopping one being written; it's not a problem with RPC itself, just 
implementations that were designed in a world with less globe-spanning 
networking going on.

> > And as for the REST argument that 'there are only a few methods, GET and
> > POST and others'... I think it's wrong to say that GET and POST are
> > methods in the same sense that getName () and getAge () are; in HTTP you
> > would do GETs on seperate URLs for the name and the age, or GET a single
> > URL that returns both name and age. In no way has GET replaced getName
> > and getAge. HTTP's GET and POST and so on correspond more to an RPC
> > protocol's' INVOKE' operation than to the application-level getName ()
> > and getAge ().
>
> You can surround the issue with logical sophistry but it doesn't change
> the core, central fact: any "application" in the world can invoke GET on
> any URI without previous negotiation.

Likewise with RPC applications!

Even in scummy Java RMI, you can write a tool like:

...regexps or whatever to split out a URI like "rmi:<registry name>.<method>"
  taken from the command line...

Remote o = Naming.lookup (<registry name>);
Method m = o.getClass ().getMethod (<method>,new Object[0]);
Object result = m.invoke (o, new Object[0]);

System.out.println (result.toString ());

That tool can be compiled up and then used to call any zero argument getter 
method anywhere. If you want to get into ones with arguments then there's a 
UI issue of setting up arbitrary Java objects for parameters, but it's still 
doable. You should really call result.toString () in a sandbox, too, since 
it'd be arbitrary code, but I'm leaving that out for the sake of ease.

Et voila - an RMI browser!

> There are dozens of standards and
> tools with support for that built in, starting with HTML, through XLink,
> through the semantic web technologies, through XSLT and XPointer,
> through the stylesheet PI, through Microsoft Office (including, I'd
> wager XDocs), through caches and browsers.

Pretty much every Unix system in the world has ONC RPC in it, and every 
"Java-enabled platform" can do RMI, et cetera.

> That will never be true for getName and getAge. That makes GET
> inherently superior to getName and getAge.

But GET is not on the same level as getName () and getAge ().

Under HTTP you get GET and PUT and POST and OPTIONS and HEAD and all that.

Under RMI protocols you have INVOKE and AUTHENTICATE and PING and all that.

One does not say "I will GET amazon" or "I will GET a book from amazon"; one 
GETs a URL that returns some aspect of information about a book.

With RMI, one INVOKEs a method that returns some aspect of information about 
a book.

That's all there is to it! RMI protocols, like HTTP, have a few operations to 
perform meta-actions like OPTIONS and HEAD. RMI has about three different 
forms of invocation, GET / POST / PUT (although PUT is potentially redundant 
with POST around) while most RMI protocols just do one; but as I've said 
before I've found it trivial to add metainformation to RPC interfaces 
specifying that methods are idempotent or cacheable, which the runtime uses 
to do what HTTP does with GET and POST.

So GET is not comparable with getName (). Under HTTP, the application-level 
action you're performing and the object you're performing it on are all 
stuffed into the URL. With RPC, you have two explicit field in the request 
packet for this, that's all. Your application does not have to have getName 
() hard coded into it any more than your HTTP client needs to have a URL 
encoded into it - although many Web browsers these days will convert "amazon" 
into "http://www.amazon.com/";; just as an RPC system browser used for 
debugging would, when presented with an object identifier but not told which 
method to invoke, invoke a default method such as "getInterface ()" to get a 
description of the object's interface (just like WSDL). A user interface 
system I designed over RPC, when presented with an object identifier, calls 
"getUserInterface ()" first to try to find an explicitly-defined UI for the 
object and if that fails because the method is not implemented then calls 
"getInterface ()" to get a raw list of methods and presents the user with a 
list (it's a bit like an HTTP server giving you a directory listing in the 
absence of index.html).

> Even if one does not buy the
> entire REST argument, it amazes me that there are still people out there
> arguing that it is better to segment the namespace rather than have a
> single unified namespace for retrieving all information.

HTTP: "http://<server>/<path>/<file>?<args>"
RMI: "//<server>/<object>.<methodname>(<args>)"

Still four components in each name; about equally segmented if you ask me :-)

> You talk about
> reinventing wheels and learning from the past. Surely the ONE THING we
> learned from the massive success of the Web is that there should be a
> single namespace.

Not quite true... there's confusing with URIs being URNs or URLs, perhaps 
there's two namespaces that shouldn't have been merged.

One namespace for locating resources was invented long before the Web, and 
the Web doesn't even do too great a job of this universal namespace thing; 
they just took a lot of namespaces and federated them into URLs with the 
scheme part. You can't rely on much more than "http:" being supported in any 
given situation so it's a fragmented namespace. This was presumably done to 
enable the http to lever itself over ftp and gopher by interoperating with 
them on a level playing field, but now it's less applicable.

In the ISO OSI world you have X.500 for URLs and OIDs for URNs.

X.500 is cursed with an ugly name syntax, sadly, for it's a great resource 
locating namespace arrangement with a single protocol to implement, so none 
of this "which schemes does this implementation support" stuff.

OIDs are, somewhat controversially, purely numeric - this decision was taken 
to prevent the various problems with names that can be trademarked to nastily 
restrict interoperability, and with names that have to be changed because the 
company involved got bought or whatever. Although this makes them less 
friendly for programmers, it avoids a lot of nasty pitfalls.

>   Paul Prescod

ABS

-- 
A city is like a large, complex, rabbit
 - ARP

Follow-Ups:
- Distributed versus local processing
  - From: Paul Prescod <paul@prescod.net>

References:
- Note from the Troll
  - From: tblanchard@mac.com
- Re: [xml-dev] Note from the Troll
  - From: Alaric Snell <alaric@alaric-snell.com>
- Re: [xml-dev] Note from the Troll
  - From: Paul Prescod <paul@prescod.net>

Prev by Date: Re: [xml-dev] What is XML For?
Next by Date: Re: [xml-dev] Seen on BugTraq: XXE (Xml eXternal Entity) attack
Previous by thread: Re: [xml-dev] Note from the Troll
Next by thread: Distributed versus local processing
Index(es):
- Date
- Thread