xml-dev - Hobbsian processes

Hobbsian processes

[ Lists Home | Date Index | Thread Index ]

To: "W. E. Perry" <wperry@fiduciary.com>
Subject: Hobbsian processes
From: Paul Prescod <paul@prescod.net>
Date: Thu, 22 Aug 2002 17:42:14 -0700
Cc: XML DEV <xml-dev@lists.xml.org>
References: <8BD7226E07DDFF49AF5EF4030ACE0B7E06621F77@red-msg-06.redmond.corp.microsoft. com> <1029896844.25933.160.camel@marajen> <3D63D550.4090904@textuality.com> <3D648D0A.A35D10C5@fiduciary.com> <3D6506D3.9BE6944A@prescod.net> <3D654402.D17DE06@fiduciary.com>

I'm coming to understand little by little. In a sense it is just "be
liberal in what you accept."

It seems to me that the decision of whether upstream conforms to
downstream or vice versa depends entirely on the nature of the
relationship between them. As you have said elsewhere:

"Now, clearly, a mom-and-pop shop wanting to leverage the Web into a
supplier contract to General Motors is perfectly happy to label its
wares however GM expects."

 * http://www.xml.com/lpt/a/2002/05/29/perry.html

If several small services find themselves in this position of having to
conform to the whims of many large organizations, then they might choose
to band together and define a standard which gives their interface an
economy of scale that can compete with that of the big players.

> I would say that the primary feature of XML documents is that each is explicitly a data
> structure.

Not as the word is used here: http://www.nist.gov/dads/ Nobody will ever
refer to the XHTML "data structure".

> For the past 12 years (first with homegrown syntactic rules, and since 1998 with
> well-formed XML) we have built and operated all of our systems on the principles I am
> promoting here.

How did you push XML upstream to your business partners? Why could you
not push a particular XML vocabulary up to them?

> Unfortunately, that form is unknown, and
> contains content which does not apply to, the Malaysian order 
> execution application which must now process it.

But what if the opposite is true. What if the form lacks information
required by Malasian law? Sometimes only the process that calls has the
extra information you need to do your job. You can interrogate the
caller but if that fails (because the caller is too stupid or speaks too
different a language to understand your questions) then the transaction
fails. The programmer of the caller will then send you an email asking:
"What the hell does your process need for its input?" Would it not be
polite to have a schema available? Once again, this presumes that it is
more economically logical for THEM to change their behaviour than YOU to
change yours. Sometimes that's the case, sometimes it isn't.

> In the early 1980's we solved these problems by building massive 
> any-to-any transformation switches, capable of going from the 
> output of any process used by any of our customers or
> their counterparties to the input of any other application to which 
> we had ever seen it connected. This is a disaster...

No doubt. But the solution most industries take is a standard. For
instance rather than a hundred hypertext languages we standardize on
HTML. Rather than a hundred labelled tree languages, we standardize on
XML. Rather than a hundred character sets, we standardize on Unicode. Do
you feel that it is inappropriate for a downstream process to push *any*
requirement on an upstream one? Should my Docbook to PDF converter
attempt to handle EBCDIC?

Now I want to understand why you reject this typical solution. You have
railed against these "Standard Data Vocabularies" as you call them. You
claim (in the same May 2002 article as above): "I have argued for years
that, on the basis of their mechanism for elaborating semantics, SDVs
are inherently unreliable for the transmission or repository of
information."

In that article, you say:

> [P]ower to shape the outcome of a patent process has been shifted 
> to [filers] by the SDV. By design, the patenting process will 
> begin with the filer's own assertions conveyed in the SDV.

Just as today. When I go for a loan I fill out a form and my assertions
*start* the process. But they do not necessarily end it. I do not
understand how the situation would be better if the input to the process
were NOT declared anywhere and were just implicit. Through experiments I
could still find the holes in your process, the places where you do not
check assertions or preconditions carefully.

Furthermore, people "game" all systems including Unicode and IP. Should
those standards be removed and replaced with heuristics?

> Particular combinations of components from the SDV which might 
> seem illogical to designers of that vocabulary may be found 
> to result in process outcomes which benefit the submitters 
> in ways never anticipated by designers.

First, please give an example. Second, please describe how a situation
without an SDV would be better.

> It may, of course,
> turn out not to be an order, in which case it will have to be rejected 
> because the order execution application can do nothing with it.

May it discard not only things that are not orders but also things which
claim to be orders but which it cannot decipher AS orders?

> In practice, the overwhelming majority of orders
> from a given source exhibit exactly the same structure, 

Not in the document world. At best they exhibit a highly similar
structure. Consider trying to deciper XHTML merely from examples that
you collect.

> ... and the internal application data
> structure required can therefore be immediately instantiated on the model of previously
> successful instantiations in the history.

What if repeat customers are rare in my business?

> Where a given instance is different from the
> usual pattern from that source, the change is usually small and 
> quite often actually occurs in some portion of the offered input 
> which is not used by this particular application. In
> such cases, again, the locally required data instantiation 
> can be accomplished immediately.

What about an attribute that says whether the document is a "buy" or
"sell". If your application does not know to look for that then it might
always think it is looking for a "buy".

> Failing both of those routes to instantiation, there are still 
> only a few fields which are likely to appear in a securities 
> order, and only a subset of those are of interest to this
> Malaysian order execution software.

But do you agree that in some circumstances the allowed list of fields
is much larger, the legal input data is much more variable and the
chance of succeeding with heuristics and regular expressions is much
lower?

> Bear in mind also that in securities processing every step of
> execution is followed by a step of comparison of the 
> outcomes between counterparties.

Please outline this comparison process. It seems key.

How do we compare these if you are speaking German and I am Swahili?
What if what I called "to" you interpreted as "from" and then reported
back to me wrong but I applied the reverse heuristic to your result
report. 

Let's say I run a web service that generates PDF's. (usually I'd call it
a RTF2PDF converter but I'll take your advice and consider it's output
the relevant distinguisher) Now somebody hands me a TROFF file. I think
I can figure it out based on heuristics, regular expressions, a few
TROFF files I've seen before and so forth. I generate the PDF. I claim
that if the client had the expertise required to know whether I have
accurately translated, the client would have been able to do the job
themselves. Therefore, as a client I wish to have a contract. If the
contract says: "give me RTF as input" then maybe I have the capability
of generating RTF and maybe the combination of my expertise in
generating RTF and yours of translating that into PDF will achieve a
better result than me throwing TROFF at you (not knowing that you only
partially understand it) and you doing a half-assed job with it.

> Yet there will of course be some very small number of 
> input documents which the instantiation layer can do nothing with, 
> particularly when it is seeing a form of data
> input for the first time. Humans will have to get involved here.

The problem is that my transaction is delayed. Once again, I might have
been better served with a clear statement of input requirements which I
could expect to be handled in a reasonable amount of time (i.e. on
computer time scales, not human time scales).

What happens when involving humans is simply not economically feasible?
Perhaps I am running an XML service (e.g. Google) which only makes money
through goodwill or perhaps some day through micro-payments. Perhaps
throwing a variety of heuristics at the problem is similarly cost
prohibitive. Then potential clients have two options: present the
information in a manner I expect or get lost. Would you not agree that
publishing a schema is the best thing to do in that situation? Yes, I am
pushing my requirements up to the client but that is because the client
is more econmically able to deal with the situation than am I.
Similarly, corporations will routinely say: "I can't give you that loan
until you fill out this form." So we adapt to the process rather than
fight to get them to adapt to us.

Furthermore, if having a "standardized input" shifts some burden from
the information consumer to the producer, can you agree that totally
unstandardized input shifts some burden in the opposite way. After all,
you've described how you need to maintain logs, write regular
expressions, and kick exceptions to human processors. It is commonly
accepted that using a standard vocabulary is a way of meeting in the
middle. You probably won't have to write totally custom code for it
because you may have other customers that use the SDV. I probably won't
have to write totally custom code for it because I may have other
suppliers that use the SDV.
-- 
 Paul Prescod

Follow-Ups:
- Re: [xml-dev] Hobbsian processes
  - From: "W. E. Perry" <wperry@fiduciary.com>
- Re: [xml-dev] Hobbsian processes
  - From: Mike Champion <mc@xegesis.org>

References:
- RE: [xml-dev] A multi-step approach on defining object-orientednature of DOM
  - From: Amelia A Lewis <amyzing@talsever.com>
- Re: [xml-dev] A multi-step approach on defining object-oriented natureof DOM
  - From: Tim Bray <tbray@textuality.com>
- Re: [xml-dev] A multi-step approach on defining object-oriented nature of DOM
  - From: "W. E. Perry" <wperry@fiduciary.com>
- Re: [xml-dev] A multi-step approach on defining object-oriented nature of DOM
  - From: Paul Prescod <paul@prescod.net>
- Re: [xml-dev] A multi-step approach on defining object-oriented nature of DOM
  - From: "W. E. Perry" <wperry@fiduciary.com>

Prev by Date: RE: [xml-dev] XML indexing/search engine
Next by Date: Re: [xml-dev] Hobbsian processes
Previous by thread: Re: [xml-dev] A multi-step approach on defining object-oriented nature of DOM
Next by thread: Re: [xml-dev] Hobbsian processes
Index(es):
- Date
- Thread