xml-dev - Re: [xml-dev] Hobbsian processes

Re: [xml-dev] Hobbsian processes
[ Lists Home | Date Index | Thread Index ]
To: XML DEV <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] Hobbsian processes
From: "W. E. Perry" <wperry@fiduciary.com>
Date: Fri, 23 Aug 2002 01:32:51 -0400
Organization: Fiduciary Automation
References: <8BD7226E07DDFF49AF5EF4030ACE0B7E06621F77@red-msg-06.redmond.corp.microsoft. com> <1029896844.25933.160.camel@marajen> <3D63D550.4090904@textuality.com> <3D648D0A.A35D10C5@fiduciary.com> <3D6506D3.9BE6944A@prescod.net> <3D654402.D17DE06@fiduciary.com> <3D6584E6.83AEE97C@prescod.net>
Paul Prescod wrote:

> I'm coming to understand little by little. In a sense it is just "be liberal in what you
> accept."

No, it's not. It is "be very specific about precisely what you need to execute a process most
expertly". At the same time, acknowledge that in the internetwork topology you are not in a
position to impose your data requirements upon an upstream process.

> It seems to me that the decision of whether upstream conforms to downstream or vice versa
> depends entirely on the nature of the relationship between them. As you have said
> elsewhere:
>
> "Now, clearly, a mom-and-pop shop wanting to leverage the Web into a supplier contract to
> General Motors is perfectly happy to label its wares however GM expects."
>
>  * http://www.xml.com/lpt/a/2002/05/29/perry.html

That seems incontrovertible, so far as it goes, but what I am trying to do is describe a
general processing model in which no data 'supplier' has to cater to the particular needs of
a data consumer, in large part because the relationship between them is never exactly that of
supplier and consumer. In the best of circumstances, each is actually an autonomous processor
whose output is the finest expression of its expertise. Because of that quality of output
from one process, another process chooses, as an expression of its own expertise in data
collection, to use that output as the basis for instantiating some portion of its own input
requirements.

> If several small services find themselves in this position of having to conform to the
> whims of many large organizations, then they might choose to band together and define a
> standard which gives their interface an economy of scale that can compete with that of the
> big players.

This is precisely what I hope to *avoid*. Economies of scale are realized at the cost of
compromising each process's autonomous expertise with concessions that it must make to
achieve the agreed common denominator. I think that the internetwork topology, exploited with
the processing model I describe, permits autonomous virtuoso services to be combined ad hoc
in the particular instance to effect a quality of outcome which the mastodons cannot achieve.

> > I would say that the primary feature of XML documents is that each is explicitly a data
> > structure.
>
> Not as the word is used here: http://www.nist.gov/dads/ Nobody will ever refer to the XHTML
> "data structure".

Sorry, any XML *instance* is composed of content, which may always be regarded as data,
juxtaposed in composition into a particular, discernible structure, which structure is
further made explicit, at least in part, by the markup. That is a data structure. Its schema,
if it has one (and one can always be deduced from the instance), describes its form so that
further instances of the same structure might be produced.

> How did you push XML upstream to your business partners? Why could you not push a
> particular XML vocabulary up to them?

First of all, these are not our business partners. Customers for our software and services
want to do trades--or at least have at the ready the capacity to do trades--with new
counterparties whose systems (and culture, currency, regulations, business practices, etc.)
are different. They do not want to pay a commercial intermediary on each trade and thereby
lose the control of the terms of execution which comes with their principal position.. They
do not want to understand each of their counterparties'' (or more exactly, potential
counterparties) practices, nor reach agreement on a common set of practices, nor build and
maintain a portal to do bi-directional transformations with every counterparty for every
different sort of trade. They can, and will, however, induce potential counterparties to
trade with them by offering quite simply the prospect of a trade from which each of the
parties clearly sees its own profit. We, as a supplier, are simply part of the expected
overhead in the execution. We intervene by routing the workflow a couple of extra hops on the
internetwork, which is transparent and utterly invisible to the parties we stand between.

> But what if the opposite is true. What if the form lacks information required by Malasian
> law?

Actually, this is usually the case, and once again illustrates why the expert process must
control its own data collection and instantiation, most particularly when the nominal
supplier of the data is incapable of providing some crucial piece. In the particulars of the
example, the money manager in heartland USA probably cannot even be made to understand what
it is being asked for by Malaysian regulation, let alone figure out how to supply it. The
only place where this job can be reliably entrusted is to the data instantiation process
operating locally in Malaysia, where it has access to the whole of the internal data
structure used by the application for which it instantiates the data. How in this case could
you possibly design a pre-agreed input data structure which puts the onus on the 'caller' to
supply something it doesn't understand and wouldn't know how to find?

> Sometimes only the process that calls has the extra information you need to do your job.

I have *never* encountered this case in practice. The 'caller' process (and I hope you see
that it isn't really that) puts what it knows into the output it produces. If some receiver
or downstream user of that output needs something that isn't there, it seems a safe bet that
it cannot get that extra information from the process that put together the original data
which lacks it.

> You can interrogate the caller but if that fails (because the caller is too stupid or
> speaks too different a language to understand your questions) then the transaction fails.
> The programmer of the caller will then send you an email asking: "What the hell does your
> process need for its input?" Would it not be polite to have a schema available? Once again,
> this presumes that it is
> more economically logical for THEM to change their behaviour than YOU to change yours.
> Sometimes that's the case, sometimes it isn't.

For the reasons I have given above, it is always pointless to ask--whether in the instance or
in the creation of what is to be an agreed interface--to ask for what it hasn't occurred to
your counterparty that you will need.

> No doubt. But the solution most industries take is a standard.

The problem in the broadly-understood trading-of-anything-fungible group of nominal
industries is that there are so many standards, occasioned by different, often overlapping,
jurisdictions of law and regulation as well as by differences of business practice in niches
which see themselves (mistakenly) as independent markets. New opportunities for profit, which
will be discovered and exploited, predictably exist in combinations or arbitrages between
instruments found in different markets which do not regard themselves as having any
connection to each other. They don't sit down to work out common standards because the great
majority of participants in each of those markets do not see themselves having anything in
common with the other market. That unawareness is precisely why the few who find ways to
arbitrage between those markets can realize their profits. After the first extraordinary
profits of those arbitrageurs, industry practice is often to build transformation portals
from one market to the other. I think that the processing model I am suggesting--based on
doing in each milieu what works best in that milieu without constraints to accommodate any
other--is much more like the method of the early arbitrageurs.

> For instance rather than a hundred hypertext languages we standardize on HTML. Rather than
> a hundred labelled tree languages, we standardize on XML. Rather than a hundred character
> sets, we standardize on Unicode. Do you feel that it is inappropriate for a downstream
> process to push *any* requirement on an upstream one? Should my Docbook to PDF converter
> attempt to handle EBCDIC?

Ours is a discussion predicated on XML, seeking to articulate useful general ways to process
it. Beyond that, it may be that we do not have any shared standards.

> Now I want to understand why you reject this typical solution. You have railed against
> these "Standard Data Vocabularies" as you call them. You claim (in the same May 2002
> article as above): "I have argued for years that, on the basis of their mechanism for
> elaborating semantics, SDVs are inherently unreliable for the transmission or repository of
> information."
>
> In that article, you say:
>
> > [P]ower to shape the outcome of a patent process has been shifted
> > to [filers] by the SDV. By design, the patenting process will
> > begin with the filer's own assertions conveyed in the SDV.
>
> Just as today. When I go for a loan I fill out a form and my assertions *start* the
> process. But they do not necessarily end it. I do not understand how the situation would be
> better if the input to the process were NOT declared anywhere and were just implicit.

In the article which you quote I am darkly pessimistic about particular uses of two standard
data vocabularies--the EPO/JPO/WIPO patent filing system and XBRL. Both are explicitly
promoted as useful for the apparently laudable goal of 'breaking down the silos of
information'. The strategy for doing that is to separate the collection of an identified body
of data from the subsequent massaging, manipulating or reporting of that data (shades of
separation of content and presentation?). The idea is that instead of a monolithic silo of
vertically integrated process, we can have modular data creation processes and data
manipulation or reporting processes, with a shared, agreed-upon standard data vocabulary |
common data structure | (call-it-what-you-will) between them. Data collected into XBRL
structures for, say, tax reporting, could be manipulated instead by tools for securities
analysis. Clearly the goal is to achieve something that many of us are after: the ability to
couple one service to another and thereby replace the silos of single-purpose data processing
with best-of-breed tools mixed and matched to serve a myriad of purposes.

The problem is that a significant portion of expert process is specific expertise in data
collection and instantiation. With XBRL, which deals with accountancy, this problem should be
glaringly obvious. The Arthur Andersen auditors of Enron are asked in Congressional testimony
how they assembled and tested the numbers which formed the basis of their satisfactory audit
opinion. It turns out they they merely accepted the numbers supplied by (as was latterly
discovered) corrupt management. Audit accounting, like most expert functions, needs to
control the process from data creation through final reporting, or the garbage input in the
form of the standard data vocabulary will flow right through as the garbage output as the
'expert' outcome.

> Through experiments I could still find the holes in your process, the places where you do
> not check assertions or preconditions carefully.

Yes you could, in which case my process would be demonstrably less expert than I might have
wanted to assert. But once you found those holes, fixing them properly will inevitably
require that I take further upstream control of my data selection and instantiation. At some
point in doing that fixup I am entitled to wonder why bother with the standard or pre-agreed
data interface at all; let me find the ultimate sources of the data I need and take
responsibility ab initio for its expert selection and instantiation.

> Furthermore, people "game" all systems including Unicode and IP. Should those standards be
> removed and replaced with heuristics?

No.

> > Particular combinations of components from the SDV which might
> > seem illogical to designers of that vocabulary may be found
> > to result in process outcomes which benefit the submitters
> > in ways never anticipated by designers.
>
> First, please give an example. Second, please describe how a situation without an SDV would
> be better.

See above on Andersen/Enron.

> > It may, of course,
> > turn out not to be an order, in which case it will have to be rejected
> > because the order execution application can do nothing with it.
>
> May it discard not only things that are not orders but also things which claim to be orders
> but which it cannot decipher AS orders?

Of course.

> > In practice, the overwhelming majority of orders
> > from a given source exhibit exactly the same structure,
>
> Not in the document world. At best they exhibit a highly similar structure. Consider trying
> to deciper XHTML merely from examples that you collect.

My world is as I describe it, and I will accept that yours is as you describe. That things
are different in your domain than in mine does not of itself impugn what I have to say about
the world I know.

> > ... and the internal application data
> > structure required can therefore be immediately instantiated on the model of previously
> > successful instantiations in the history.
>
> What if repeat customers are rare in my business?

Again, I am describing from substantial experience the relative usefulness of each of the
heuristic strategies which I employ. Given entirely different circumstances the relative
usefulness of the strategies certainly might shift and for efficiency it might be a good idea
to apply them in a different order.

> > Where a given instance is different from the
> > usual pattern from that source, the change is usually small and
> > quite often actually occurs in some portion of the offered input
> > which is not used by this particular application. In
> > such cases, again, the locally required data instantiation
> > can be accomplished immediately.
>
> What about an attribute that says whether the document is a "buy" or "sell". If your
> application does not know to look for that then it might always think it is looking for a
> "buy".

Why do you think that a process designed with true domain expertise would make such a stupid
assumption?

> > Failing both of those routes to instantiation, there are still
> > only a few fields which are likely to appear in a securities
> > order, and only a subset of those are of interest to this
> > Malaysian order execution software.
>
> But do you agree that in some circumstances the allowed list of fields is much larger, the
> legal input data is much more variable and the chance of succeeding with heuristics and
> regular expressions is much lower?

You are now talking about problems which need to be properly factored if we are to exploit
the benefits of the processing model I advocate. That factoring is a necessary part of
designing applications on this model.

> > Bear in mind also that in securities processing every step of
> > execution is followed by a step of comparison of the
> > outcomes between counterparties.
>
> Please outline this comparison process. It seems key.

In the terms of the example, the order execution system reports to me (in the form of an
output document), that I sold 1000 shares of XYZ security to dealer PQR. I route a copy of
that report to PQR, who must have a matching report from the order execution system. If PQR
does not, or if I do not receive a report from PQR which matches the one which I received
from the order execution system, then we have an outtrade, or non-match, and the trade is
cancelled.

> How do we compare these if you are speaking German and I am Swahili? What if what I called
> "to" you interpreted as "from" and then reported back to me wrong but I applied the reverse
> heuristic to your result report.

Comparisons, in the technical sense that I have just illustrated, are always presented by
both parties in the terms of the market where they have met to execute the transaction being
compared. By design, the characteristics of each participant away from that market are
immaterial to this process.

> Let's say I run a web service that generates PDF's. (usually I'd call it a RTF2PDF
> converter but I'll take your advice and consider it's output the relevant distinguisher)
> Now somebody hands me a TROFF file. I think I can figure it out based on heuristics,
> regular expressions, a few TROFF files I've seen before and so forth. I generate the PDF.

So far, so good.

> I claim that if the client had the expertise required to know whether I have accurately
> translated, the client would have been able to do the job themselves.

Quite possibly, but what of it?

> Therefore, as a client I wish to have a contract.

non sequitur

> If the contract says: "give me RTF as input" then maybe I have the capability of generating
> RTF and maybe the combination of my expertise in generating RTF and yours of translating
> that into PDF will achieve a better result than me throwing TROFF at you (not knowing that
> you only partially understand it) and you doing a half-assed job with it.

I'm sorry but I don't see a point here. The possibility that a pair of expert services might
be conjoined in an inappropriate or even less than optimal way does not of itself indicate
that they should, instead of risking that possibility, have an a priori agreement on the one
and only way that they will interface (presumably through a single common data structure)
with each other.

> > Yet there will of course be some very small number of
> > input documents which the instantiation layer can do nothing with,
> > particularly when it is seeing a form of data
> > input for the first time. Humans will have to get involved here.
>
> The problem is that my transaction is delayed. Once again, I might have been better served
> with a clear statement of input requirements which I could expect to be handled in a
> reasonable amount of time (i.e. on computer time scales, not human time scales).

This is standard business practice in securities trading (which is the domain of the
example), even in the absence of any automated processing. Your transaction will usually not
be delayed beyond its regular settlement date, or if it is then it is because that particular
transaction is flawed in ways which *should* keep it from settling.

> What happens when involving humans is simply not economically feasible? Perhaps I am
> running an XML service (e.g. Google) which only makes money through goodwill or perhaps
> some day through micro-payments. Perhaps throwing a variety of heuristics at the problem is
> similarly cost prohibitive. Then potential clients have two options: present the
> information in a manner I expect or get lost.

That Hobson's choice is too often a feature of current services. We can treat our users
better than that.

> Would you not agree that publishing a schema is the best thing to do in that situation?

No. That is simply another presentation of the same Hobson's choice.

> Yes, I am pushing my requirements up to the client

Indeed you are.

> but that is because the client is more econmically able to deal with the situation than am
> I.

Maybe, maybe not. But I think you have lost sight that you set out to provide a service, but
now you not only fail to provide a service but are gratuitously in your (potential) client's
face.

> Similarly, corporations will routinely say: "I can't give you that loan until you fill out
> this form." So we adapt to the process rather than fight to get them to adapt to us.

That's one way to do business, yes, and I might put up with it for a while if I really had no
other choice for something that I could not do without. However, the day a service was
sufficiently expert in marketing to reuse a document which I had published for some other
purpose, as the basis for offering me something I needed, I'd likely reward their initiative
with my business.

> Furthermore, if having a "standardized input" shifts some burden from the information
> consumer to the producer, can you agree that totally unstandardized input shifts some
> burden in the opposite way.

Of course it does. This is a *good* thing (and in the case of preventing fraud by controlling
data selection and instantiation, a necessary act of prudence).

> After all, you've described how you need to maintain logs, write regular expressions, and
> kick exceptions to human processors. It is commonly accepted that using a standard
> vocabulary is a way of meeting in the middle. You probably won't have to write totally
> custom code for it because you may have other customers that use the SDV. I probably won't
> have to write totally custom code for it because I may have other suppliers that use the
> SDV.

There is no middle, in the sense of stable, identifiable ground where vastly different
processes can reliably be expected to meet. There is only the particular interface of one
instance of one process making use, for its own purposes, of one instance of the output of
another process.

And so to bed, with sincere apologies to the list for the length of this. I felt that Paul's
questions deserved complete answers, even if I have now consumed my next month's quota of
xml-dev bandwidth.

Respectfully,

Walter Perry
References:
- RE: [xml-dev] A multi-step approach on defining object-orientednature of DOM
  - From: Amelia A Lewis <amyzing@talsever.com>
- Re: [xml-dev] A multi-step approach on defining object-oriented natureof DOM
  - From: Tim Bray <tbray@textuality.com>
- Re: [xml-dev] A multi-step approach on defining object-oriented nature of DOM
  - From: "W. E. Perry" <wperry@fiduciary.com>
- Re: [xml-dev] A multi-step approach on defining object-oriented nature of DOM
  - From: Paul Prescod <paul@prescod.net>
- Re: [xml-dev] A multi-step approach on defining object-oriented nature of DOM
  - From: "W. E. Perry" <wperry@fiduciary.com>
- Hobbsian processes
  - From: Paul Prescod <paul@prescod.net>
Prev by Date: RE: [xml-dev] XML indexing/search engine
Next by Date: Re: [xml-dev] A multi-step approach on defining object-orientednatureof DOM
Previous by thread: Re: [xml-dev] Hobbsian processes
Next by thread: Re: [xml-dev] A multi-step approach on defining object-orientednature of DOM
Index(es):
- Date
- Thread