xml-dev - Re: [xml-dev] Distributed versus local processing

Re: [xml-dev] Distributed versus local processing
[ Lists Home | Date Index | Thread Index ]
To: Paul Prescod <paul@prescod.net>
Subject: Re: [xml-dev] Distributed versus local processing
From: "Alaric B. Snell" <alaric@alaric-snell.com>
Date: Thu, 31 Oct 2002 18:16:29 +0000
Cc: xml-dev@lists.xml.org
In-reply-to: <3DC15B34.50809@prescod.net>
References: <36C08A70-E9A2-11D6-9F26-0030657E2F34@mac.com> <E186pZI-0001AB-00@calvin.frontwire.com> <3DC15B34.50809@prescod.net>
On Thursday 31 October 2002 16:32, Paul Prescod wrote:
> Alaric B. Snell wrote:
> > ...
> >
> > The same problem occurs with local code. You can execute a local
> > procedure call that never returns, and not know if it's completed or not.
> > Power fails sometimes :-)
> >
> > This is why we have transactions and rollback and all that.
>
> The vast majority of local applications are NOT written with this in
> mind. People don't usually write their JSPs with an exception handler
> that retries in case the database dies. It would be way too expensive.

Quite, it's not always worth the effort. But it's there when it matters.

The point still remains, though, that networks have no monopoly on being able 
to fail and drop packets. They present higher chances of failure in most 
cases, sure, but that's not qualitative jump, just a quantitative one.

> Java programs _can be_ architected for failure but that is completely
> orthoganal to Java language features. Architecting for failure is about
> thinking through what happens if each method call fails or times out,
> whether to retry, how to compensate if some calls succeed and others do
> not, etc.

Yes, my intent was that the standard libraries and the language itself are 
such that you have the tools you need to architect for failure, not that the 
programmer HAS to use them.

You can solve this problem in systems software, though, or at least make 
solving it in applications require less gruntwork by using transactions 
coupled with your exception mechanism; an uncaught exception out of a 
transaction block rolls it back and two-phase (or more) commit for the 
networked portions. Then the default behaviour in the event of failure will 
be to undo everything done since the last transaction begin, which the 
programmers insert as checkpoints.

Of course, you still need to manually handle the undoing of events like 
missile launches and printing out cheques :-)

> > ... As for latency... well, local disks and awkwardly long
> > computations or awkwardly large files or processors slower than the ones
> > the software was written for are as bad a source of latency as networks,
> > and the fact that it's not handled so well in non-distributed apps is
> > something that I'd moan about anyway.
>
> Most people do not want to pay four times as much for local software so
> that it can be as difficult to develop as network software.

Exactly. I'm interested in building entire software systems (including the 
OS) around transactions to see how much easier this makes error handling...

> > ... Windows machines really do grind when I throw
> > multi-gigabyte CSV files into Excel; I don't mind that it takes forever,
> > but it'd be nice if I could still use the machine for other things while
> > it's at it :-)
>
> Sure, there are reasonable things that can be done in the local case as
> if it were the remote case. But it would be crazy for Excel to implement
> an exponential backoff retry policy for disk writes.

Why would it want to? Different problem. If the disk fails to write after a 
couple of retries then that sector is bad and needs to be remapped elsewhere 
and the replacement of the disk requested. This is best done at a lower 
level, though, beneath the disk API. But you wouldn't have me hide any 
networking issues at a lower level beneath an RPC API, hmm?

> > Yep. But that's not something orthogonal to RPC. There aren't enough RPC
> > systems that deal with these issues well, certainly, but there's nothing
> > stopping one being written; it's not a problem with RPC itself, just
> > implementations that were designed in a world with less globe-spanning
> > networking going on.
>
> So you theorize. When your protocol comes out we can all judge. I
> predict we will find otherwise.

It's not really on topic, but I can post a link to XML-DEV when I'm done if 
you like!

> >>You can surround the issue with logical sophistry but it doesn't change
> >>the core, central fact: any "application" in the world can invoke GET on
> >>any URI without previous negotiation.
> >
> > Likewise with RPC applications!
>
> That's not true.

public static Object invoke (String objectName, String methodName, Class[] 
paramTypes, Object[] args) {
	Remote r = Naming.lookup (objectName);
	Method m = r.getClass ().getMethod (methodName, paramTypes);
        return m.invoke (r,args);
}

...plus some exception remapping, and you're done. That's paraphrased from 
live code currently in use, thankyou very much! Although we implemented our 
own exponential backoff retry to handle the case of the server being 
temporarily down, and we don't re-lookup the name every time, and so on.

> > Even in scummy Java RMI, you can write a tool like:
> >
> > ...regexps or whatever to split out a URI like "rmi:<registry
> > name>.<method>"
>
> You are _proposing_ that what you said above _could_ be true in the
> future with RMI if you can convince people to universally implement an
> RMI URI scheme. I am confident that you won't succeed so in my mind you
> are comparing an existing system to an impossibility.

You don't need the URI scheme to make it happen; we already do it with two 
strings, I'm just saying that this isn't really an important distinction. 
mailto:user@host is just a URI with two strings in it.

> >   taken from the command line...
> >
> > Remote o = Naming.lookup (<registry name>);
> > Method m = o.getClass ().getMethod (<method>,new Object[0]);
> > Object result = m.invoke (o, new Object[0]);
> >
> > System.out.println (result.toString ());
>
> This makes no logical sense. In order to get the data for a stock quote
> I have to download an appropriate class, find a static method, invoke a
> method coerce the result to string and return it?

Not done much Java? Ok. What it says is:

1) Get a "connection" to the object given the name. Call the result 'o'
2) Find the zero-param method with the given name on that object
3) Call the method (giving it an empty list of arguments)
4) Ask the result to display itself

> > That tool can be compiled up and then used to call any zero argument
> > getter method anywhere. If you want to get into ones with arguments then
> > there's a UI issue of setting up arbitrary Java objects for parameters,
> > but it's still doable. You should really call result.toString () in a
> > sandbox, too, since it'd be arbitrary code, but I'm leaving that out for
> > the sake of ease.
>
> You think people should trust the Java sandbox with the execution of
> their most crucial business processes?

I don't think many organisations will put their must crucial business 
processes in the toString () method. That method is available on all objects, 
and is used as a default way of displaying the object - it's handy for 
debugging since you can just print an object out to the log to see its state. 
That's it's main use.

> > Pretty much every Unix system in the world has ONC RPC in it, and every
> > "Java-enabled platform" can do RMI, et cetera.
>
> Are you really comparing "every Unix system in the world" to the
> deployed base of web technologies?

Yes; why?

> >>That will never be true for getName and getAge. That makes GET
> >>inherently superior to getName and getAge.
> >
> > But GET is not on the same level as getName () and getAge ().
> >
> > Under HTTP you get GET and PUT and POST and OPTIONS and HEAD and all
> > that.
> >
> > Under RMI protocols you have INVOKE and AUTHENTICATE and PING and all
> > that.
>
> AUTHENTICATE and PING are in the same class as OPTIONS.

Yep.

> HEAD is an optimized alias for GET.

I'd say that's not the best way of explaining it, but I know you know what it 
means.

> Therefore the real comparison is between "GET", "PUT", POST", "DELETE"
> on the one hand and "INVOKE" on the other hand.

Yep.

> > One does not say "I will GET amazon" or "I will GET a book from amazon";
> > one GETs a URL that returns some aspect of information about a book.
>
> GET is a short form for "GET a representation of the referenced
> resource." Therefore one does say: "I will GET a representation of
> Amazon" or "I will get a representation of a book at Amazon."

Yep.

> > With RMI, one INVOKEs a method that returns some aspect of information
> > about a book.
>
> And perhaps changes the object.

I'm comparing it to GET in this case, get...() methods shouldn't go changing 
things in ways that matter any more than GET should.

> > That's all there is to it! RMI protocols, like HTTP, have a few
> > operations to perform meta-actions like OPTIONS and HEAD. RMI has about
> > three different forms of invocation, GET / POST / PUT (although PUT is
> > potentially redundant with POST around) while most RMI protocols just do
> > one;
>
> I think you meant to say that "HTTP has about three...". Actually it has
> four, GET/PUT/POST/DELETE.

DELETE doesn't get much press.

> > ... but as I've said
> > before I've found it trivial to add metainformation to RPC interfaces
> > specifying that methods are idempotent or cacheable, which the runtime
> > uses to do what HTTP does with GET and POST.
>
> There is a huge difference between having globally available the
> definition of which are idempotent, cachable and safe and having that in
> metainformation.

Whether you GET or POST a URL to acheive your effect IS metainformation! And 
it's just as globally available with RPC systems that have reflection like 
RMI and CORBA.

> If it is available in metainformation then I have to
> write link-following code like this:
>
> link = getLinkFromSomewhere()
> service = link.parseService()
> method = link.parseMethod()
> metainfo = service.getMetaInfo(method)
> if(metainfo.isSafe()){
>     newdata = link.getData()
> }else{
>     println("The link is not safe");
> }
>
> And you get convoluted logic like this in caches, firewalls and other
> intermediaries. Whereas the equivalent code with HTTP is:
>
> data = link.GET()

No, the equivelant RPC code to that is data = object.getThing () :-)

The equivelant HTTP to the wierdness you posted is probably something 
involving parsing WSDL?

But metadata extraction code I've writ for Java looks like:

link = getLinkFromSomewhere () // you missed this out in the HTTP one

Class c = link.getClass ();

boolean idempotent;

try {
   idempotent = c.getField (methodName + "IsIdempotent").getBoolean (link);
}
catch (NoSuchFieldException e) {
   // There was no explicit flag, assume not idempotent for now
   idempotent = false;
}

> What's the benefit? And by the way, this is another comparison of a
> running system to a system that might exist one day.

..."code I've write for Java"...

> > So GET is not comparable with getName ().  Under HTTP, the
> > application-level action you're performing and the object you're
> > performing it on are all stuffed into the URL.
>
> You're incorrect. The application-level action is "get me representation
> of this resource." This can be seen by looking at any HTTP application.
> It is very common to see them directly invoke HTTP GET and POST methods
> or through a VERY thin layer of abstraction (sometimes GET is done
> through a URL library). Whereas the equivalent RMI invokes getName or
> getDate. So to the application programmer, GET is the logical equivalent
> of getName and getDate. The only difference is that HTTP moves
> everything relating to the TYPE and ROLE of the data out of the method
> name because it has no need to be there. You can easily refactor your
> "API" so that everything to do with the type and role of the data lives
> _in_ the data and not in the operation names. IMHO, this is merely good
> design.

It's equivelant! Reflective programming languages make this explicit! The 
ability to treat method names as static stuff or to make them a parameter to 
a general "invoke" method has been around since... early Lisp days? You see 
it above in the Java code that calls arbitrary methods. The method name is an 
*argument* there. You use whichever approach is more convenient, and are not 
*bound* by the pros and cons of *either*.

> > HTTP: "http://<server>/<path>/<file>?<args>"
> > RMI: "//<server>/<object>.<methodname>(<args>)"
> >
> > Still four components in each name; about equally segmented if you ask me
> > :-)
>
> Obviously I was comparing REST to RMI as it exists and not RMI as it
> might be.

Good; so was I.

>   Paul Prescod

ABS

-- 
A city is like a large, complex, rabbit
 - ARP
Follow-Ups:
- Re: [xml-dev] Distributed versus local processing
  - From: Miles Sabin <miles@milessabin.com>
References:
- Note from the Troll
  - From: tblanchard@mac.com
- Re: [xml-dev] Note from the Troll
  - From: "Alaric B. Snell" <alaric@alaric-snell.com>
- Distributed versus local processing
  - From: Paul Prescod <paul@prescod.net>
Prev by Date: Re: [xml-dev] Article: "The horror of XML"
Next by Date: RE: [xml-dev] Infosets a Horror Story? ( was Re: [xml-dev] Article:"The horror of XML")
Previous by thread: Distributed versus local processing
Next by thread: Re: [xml-dev] Distributed versus local processing
Index(es):
- Date
- Thread