xml-dev - Distributed versus local processing

Distributed versus local processing
[ Lists Home | Date Index | Thread Index ]
To: "Alaric B. Snell" <alaric@alaric-snell.com>
Subject: Distributed versus local processing
From: Paul Prescod <paul@prescod.net>
Date: Thu, 31 Oct 2002 09:32:52 -0700
Cc: xml-dev@lists.xml.org
References: <36C08A70-E9A2-11D6-9F26-0030657E2F34@mac.com> <20021028214806.0EA535542@calm.warhead.org.uk> <3DBEB84C.8060507@prescod.net> <E186pZI-0001AB-00@calvin.frontwire.com>
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.1) Gecko/20020826
Alaric B. Snell wrote:
> ...
> 
> The same problem occurs with local code. You can execute a local procedure 
> call that never returns, and not know if it's completed or not. Power fails 
> sometimes :-)
> 
> This is why we have transactions and rollback and all that.

The vast majority of local applications are NOT written with this in 
mind. People don't usually write their JSPs with an exception handler 
that retries in case the database dies. It would be way too expensive.

> ...
> But it's all the same to the application code. All of this is an issue for 
> the systems software you're working on.

That's the myth that the Waldo paper tries to puncture.

  * http://research.sun.com/techrep/1994/abstract-29.html

>...
> Hava programs are architected around the potential for failure, they have 
> exceptions. 

That's ridiculous. The fact that exceptions exist does not require 
architecting for failure. This is a very common pattern:

try{
	doSomethingUrgent();
}catch(NetworkFailure e){
	System.err.println("Network failure "+e);
}

Java programs _can be_ architected for failure but that is completely 
orthoganal to Java language features. Architecting for failure is about 
thinking through what happens if each method call fails or times out, 
whether to retry, how to compensate if some calls succeed and others do 
not, etc.

> ... As for latency... well, local disks and awkwardly long 
> computations or awkwardly large files or processors slower than the ones the 
> software was written for are as bad a source of latency as networks, and the 
> fact that it's not handled so well in non-distributed apps is something that 
> I'd moan about anyway. 

Most people do not want to pay four times as much for local software so 
that it can be as difficult to develop as network software.


> ... Windows machines really do grind when I throw 
> multi-gigabyte CSV files into Excel; I don't mind that it takes forever, but 
> it'd be nice if I could still use the machine for other things while it's at 
> it :-)

Sure, there are reasonable things that can be done in the local case as 
if it were the remote case. But it would be crazy for Excel to implement 
an exponential backoff retry policy for disk writes.

 > ...
> Yep. But that's not something orthogonal to RPC. There aren't enough RPC 
> systems that deal with these issues well, certainly, but there's nothing 
> stopping one being written; it's not a problem with RPC itself, just 
> implementations that were designed in a world with less globe-spanning 
> networking going on.

So you theorize. When your protocol comes out we can all judge. I 
predict we will find otherwise.

>...
>>You can surround the issue with logical sophistry but it doesn't change
>>the core, central fact: any "application" in the world can invoke GET on
>>any URI without previous negotiation.
> 
> 
> Likewise with RPC applications!

That's not true.

> Even in scummy Java RMI, you can write a tool like:
> 
> ...regexps or whatever to split out a URI like "rmi:<registry name>.<method>"

You are _proposing_ that what you said above _could_ be true in the 
future with RMI if you can convince people to universally implement an 
RMI URI scheme. I am confident that you won't succeed so in my mind you 
are comparing an existing system to an impossibility.

>   taken from the command line...
> 
> Remote o = Naming.lookup (<registry name>);
> Method m = o.getClass ().getMethod (<method>,new Object[0]);
> Object result = m.invoke (o, new Object[0]);
> 
> System.out.println (result.toString ());

This makes no logical sense. In order to get the data for a stock quote 
I have to download an appropriate class, find a static method, invoke a 
method coerce the result to string and return it?

> That tool can be compiled up and then used to call any zero argument getter 
> method anywhere. If you want to get into ones with arguments then there's a 
> UI issue of setting up arbitrary Java objects for parameters, but it's still 
> doable. You should really call result.toString () in a sandbox, too, since 
> it'd be arbitrary code, but I'm leaving that out for the sake of ease.

You think people should trust the Java sandbox with the execution of 
their most crucial business processes?

> Et voila - an RMI browser!
> 
> 
>>There are dozens of standards and
>>tools with support for that built in, starting with HTML, through XLink,
>>through the semantic web technologies, through XSLT and XPointer,
>>through the stylesheet PI, through Microsoft Office (including, I'd
>>wager XDocs), through caches and browsers.
> 
> 
> Pretty much every Unix system in the world has ONC RPC in it, and every 
> "Java-enabled platform" can do RMI, et cetera.

Are you really comparing "every Unix system in the world" to the 
deployed base of web technologies?

>>That will never be true for getName and getAge. That makes GET
>>inherently superior to getName and getAge.
> 
> 
> But GET is not on the same level as getName () and getAge ().
> 
> Under HTTP you get GET and PUT and POST and OPTIONS and HEAD and all that.
> 
> Under RMI protocols you have INVOKE and AUTHENTICATE and PING and all that.

AUTHENTICATE and PING are in the same class as OPTIONS.

HEAD is an optimized alias for GET.

Therefore the real comparison is between "GET", "PUT", POST", "DELETE" 
on the one hand and "INVOKE" on the other hand.

> One does not say "I will GET amazon" or "I will GET a book from amazon"; one 
> GETs a URL that returns some aspect of information about a book.

GET is a short form for "GET a representation of the referenced 
resource." Therefore one does say: "I will GET a representation of 
Amazon" or "I will get a representation of a book at Amazon."

> With RMI, one INVOKEs a method that returns some aspect of information about 
> a book.

And perhaps changes the object.

> That's all there is to it! RMI protocols, like HTTP, have a few operations to 
> perform meta-actions like OPTIONS and HEAD. RMI has about three different 
> forms of invocation, GET / POST / PUT (although PUT is potentially redundant 
> with POST around) while most RMI protocols just do one;

I think you meant to say that "HTTP has about three...". Actually it has 
four, GET/PUT/POST/DELETE.

> ... but as I've said 
> before I've found it trivial to add metainformation to RPC interfaces 
> specifying that methods are idempotent or cacheable, which the runtime uses 
> to do what HTTP does with GET and POST.

There is a huge difference between having globally available the 
definition of which are idempotent, cachable and safe and having that in 
metainformation. If it is available in metainformation then I have to 
write link-following code like this:

link = getLinkFromSomewhere()
service = link.parseService()
method = link.parseMethod()
metainfo = service.getMetaInfo(method)
if(metainfo.isSafe()){
    newdata = link.getData()
}else{
    println("The link is not safe");
}

And you get convoluted logic like this in caches, firewalls and other 
intermediaries. Whereas the equivalent code with HTTP is:

data = link.GET()

What's the benefit? And by the way, this is another comparison of a 
running system to a system that might exist one day.

> So GET is not comparable with getName ().  Under HTTP, the application-level
> action you're performing and the object you're performing it on are all 
> stuffed into the URL. 

You're incorrect. The application-level action is "get me representation 
of this resource." This can be seen by looking at any HTTP application. 
It is very common to see them directly invoke HTTP GET and POST methods 
or through a VERY thin layer of abstraction (sometimes GET is done 
through a URL library). Whereas the equivalent RMI invokes getName or 
getDate. So to the application programmer, GET is the logical equivalent 
of getName and getDate. The only difference is that HTTP moves 
everything relating to the TYPE and ROLE of the data out of the method 
name because it has no need to be there. You can easily refactor your 
"API" so that everything to do with the type and role of the data lives 
_in_ the data and not in the operation names. IMHO, this is merely good 
design.

> ... With RPC, you have two explicit field in the request 
> packet for this, that's all. Your application does not have to have getName 
> () hard coded into it any more than your HTTP client needs to have a URL 
> encoded into it 

See above. Of course you can always push the responsibility fo figuring 
out whether the thing is safe to invoke or invoke repeatedly or cache up 
a level to the programmer. But I don't think it helps anything.

- although many Web browsers these days will convert "amazon"
> into "http://www.amazon.com/";; just as an RPC system browser used for 
> debugging would, when presented with an object identifier but not told which 
> method to invoke, invoke a default method such as "getInterface ()" to get a 
> description of the object's interface (just like WSDL).

Even many proponents of WSDL admit that it is a terrible idea to have 
the default method on a service be "getInterface". The interface is a 
subset of the data you want to know about a service. The other part of 
the infomation you want is the _state_. HTTP wraps the two things  up 
into a single message. By convention and design, it also wraps up many 
individual bits of state and type into one message. This makes sense for 
the reasons described here:

  * http://www.fawcette.com/xmlmag/2002_06/magazine/departments/endtag/

Calling "getName", "getDate", "getType", etc. to construct a client-side 
representation of a service is just poor design.

> ... A user interface 
> system I designed over RPC, when presented with an object identifier, calls 
> "getUserInterface ()" first to try to find an explicitly-defined UI for the 
> object and if that fails because the method is not implemented then calls 
> "getInterface ()" to get a raw list of methods and presents the user with a 
> list (it's a bit like an HTTP server giving you a directory listing in the 
> absence of index.html).

The HTTP server's directory listing is listing _state_ not _methods_. At 
runtime nobody cares about methods. They want to know about available 
_state_. You force them to indirect through an extra level of 0-argument 
methods presumably tagged with a magical flag that says: "I'm really 
like an HTTP hyperlink. Trust me!". I don't see how that is progress.

> ...
>>Even if one does not buy the
>>entire REST argument, it amazes me that there are still people out there
>>arguing that it is better to segment the namespace rather than have a
>>single unified namespace for retrieving all information.
> 
> 
> HTTP: "http://<server>/<path>/<file>?<args>"
> RMI: "//<server>/<object>.<methodname>(<args>)"
>
> Still four components in each name; about equally segmented if you ask me :-)

Obviously I was comparing REST to RMI as it exists and not RMI as it 
might be.

> ...
> Not quite true... there's confusing with URIs being URNs or URLs, perhaps 
> there's two namespaces that shouldn't have been merged.

"Confusion" is not a technical problem and it is my sense that this 
confusion is transitory.

> One namespace for locating resources was invented long before the Web, and 
> the Web doesn't even do too great a job of this universal namespace thing; 
> they just took a lot of namespaces and federated them into URLs with the 
> scheme part. You can't rely on much more than "http:" being supported in any 
> given situation so it's a fragmented namespace. 

As we move more services to HTTP, the URI space becomes more and more 
unified. But anyhow, HTTP, FTP and MAILTO are pretty much universally 
supported. Even small devices can handle these three.

> ... This was presumably done to 
> enable the http to lever itself over ftp and gopher by interoperating with 
> them on a level playing field, but now it's less applicable.

The fact that there are few widely deployed new URI schemes is a 
strength not a weakness. You say you're developing a new IMAP protocol. 
I hope you've learned the lesson and are using URI addresses and HTTP 
operations. Otherwise you're the one fragmenting the namespace (and your 
protocol will have a higher hurdle to adoption).

Nevertheless, if we ever have to move beyond the HTTP protocol to an 
even more general protocol, the Web has the syntactic and semantic 
freedom to make that move. I consider that a strength, not a weakness. 
It does take discipline to use that feature right.

> In the ISO OSI world you have X.500 for URLs and OIDs for URNs.
> 
> X.500 is cursed with an ugly name syntax, sadly, for it's a great resource 
> locating namespace arrangement with a single protocol to implement, so none 
> of this "which schemes does this implementation support" stuff.

And none of the flexibility to roll out a new protocol when it becomes 
necessary.

  Paul Prescod
Follow-Ups:
- Re: [xml-dev] Distributed versus local processing
  - From: "Alaric B. Snell" <alaric@alaric-snell.com>
References:
- Note from the Troll
  - From: tblanchard@mac.com
- Re: [xml-dev] Note from the Troll
  - From: Alaric Snell <alaric@alaric-snell.com>
- Re: [xml-dev] Note from the Troll
  - From: Paul Prescod <paul@prescod.net>
- Re: [xml-dev] Note from the Troll
  - From: "Alaric B. Snell" <alaric@alaric-snell.com>
Prev by Date: Re: [xml-dev] Newbie question:DTD problem(#PCDATA)
Next by Date: Re: [xml-dev] Article: "The horror of XML"
Previous by thread: Re: [xml-dev] Note from the Troll
Next by thread: Re: [xml-dev] Distributed versus local processing
Index(es):
- Date
- Thread