xml-dev - Re: Why the Infoset?

Re: Why the Infoset?
[ Lists Home | Date Index | Thread Index ]
From: "W. E. Perry" <wperry@fiduciary.com>
To: XML DEV <xml-dev@lists.xml.org>
Date: Sat, 29 Jul 2000 11:52:42 -0400
Rick JELLIFFE wrote:

> I disagree.

I believe that we actually agree on the salient points and disagree only on question of
how (and where) processing should be done (and processing decisions made).

> The basis of SGML'86 was that the rooted, directed, cyclic graph with attribute-value
> tree framework that allowed a handful of general distinctions on the edges (child,
> parent, next, attribute, IDREF, etc) and a handful of general types on the named nodes
> (element, comment, PI, etc) was, when coupled with a simultaneous rooted graph of
> entities with a handful of general types (NDATA, sgml), was sufficient for an enormous
> number of complex problems.  On top of this information model, the need to cope with an
> enormous number of possible notations and syntaxes.

I accept and agree with this characterization of SGML. Without disturbing these premises
in the cases where they are useful, the concept of well-formedness in XML 1.0 posits an
entirely different starting point:  effectively it is to dismiss the graph mechanism and
the tree metaphor, the apparently necessary 'general types', and even the abstraction of
a document node, and to deal instead solely with a mechanically correct syntax.

> XML's WF-only is not "entirely" self-sufficient for anything to do with graphs.

I never said that it was, for graphs or for anything else which the processor of a
document must 'do'. I note, for example, that "the true content model of an instance
document might be uniquely derived at the time and place of its use". That requires an
appropriate processor, which makes appropriate assumptions and affords appropriate
ancillary resources for that time and place. The instance document on which that
processor acts is not 'entirely self-sufficient' for the execution of that process (if it
were, it would qualify for my objection to SOAP); it is simply that the document is
entirely self-sufficient *as syntax* for input to that processor. That is, the document
does not require a DTD, schema, or stylesheet in order to be interpreted by the processor
in a way which is appropriate for the unique circumstances of its processing.

> (The problem is only fixed by hard-coding, i.e. XLink, and hardcoding requires
> universal names, i.e. Namespaces using some public-identifer-like registration system,
> i.e. URIs)

You (and the Infoset) want these hardcoded in the document or registered in some
canonical form at a fixed address. I (and the philosophy of WF, I assert) want these
universalities to be determined by the processor as appropriate for the particular
circumstances in which a document instance is processed. At this point in the argument I
always use the same examples, but those examples are from working, production systems
which I have built for many years and to which many millions of real dollars are
committed daily. Consider a securities order ticket. The 'intended' use of that document
is to instruct a trader to execute a buy or sell in a given security, subject to
optionally given conditions. Yet in the full sequence of processing which that trade,
once executed, requires there are necessary subsequent uses of that document which its
creator may be only vaguely aware of, and for which he did not--and did not know how
to--provide or specify the necessary 'universalities' for that document's subsequent
correct interpretation. After execution, that order ticket must be routed for trade
comparison, cashiering, securities receive and deliver, custody notification, regulatory
compliance, and portfolio analysis. In all of these cases it *should* be the original
order ticket document, rather than some re-statement of it, which is routed for input to
a process entirely unaccounted for in the original composition of that document and
utterly absent from whatever 'intent' might have been expressed by that document's
creator. Re-using that same document not only avoids errors of transcription but makes
the necessary auditable connection between that document and the outcome of such
processes as cash payment, securities delivery and tax reporting. The only way to re-use
that document appropriately is for each processor which acts upon it to resolve
references, scope and links, and to execute transformation, in a manner which is unique
to the circumstances of that particular process and dependent not upon any such
universalities hardcoded in the document or available from some single canonical
reference, but expressed as the particular expertise of that processor for the
specialized job which it does.

> What XML did was to say that a lot of users only need simple AV-trees, so lets allow
> them to have them with little fuss.

They are not necessarily trees. That metaphor is only one of many possible semantic
elaborations, by an instance process, from simple WF syntax.

> And it said that people could agree on a syntax.

That is, in fact, all that WFness dares to claim.

> I think this is too hard on schemas: what a schema does, in part, is specify which
> additional constraints the document has more than WF XML. These constraints allow more
> optimal handling of data: if I know that my content model is (a,b,c)+ and
> that it is closed, then I can allocate a list with three slots for them and I know that
> the XPath  a[position()=1]  on the parent will always succeed. If I know that an
> element is a date, I can store it in a database as a date not a string. If I know that
> a value or combinations of value is unique, I can use them as keys for faster access to
> data.

I agree entire with this. It is the point of schemas. The question is whether those
schemas are pre-ordained or are derived in a form specifically meaningful and useful to
the instance process.

> The idea of a syntax with no schema/DTD is hardly new: in part, it was the infelicities
> of these that caused SGML'86 to take such a strong and radical view: if anyone can put
> any element anywhere, how can a consumer contractually require an information producer
> to produce certain information?  LISP or ADA or any of the languages with
> position-independent
> and nameable parameters provides the same basic capabilities as XML WF: why are they
> not good markup languages?   It is the ability to constrain data by schemas that is the
> key.

I agree with your statement of the problem. I (and, I assert, the WF philosophy) propose
a radically different approach to its solution than do you and SGML'86.

> If data were all atomic, and each datum was described by a universal name, and no two
> documents were similar, then I think
> William's view would be pretty close to the mark: documents could be ad hoc assemblages
> of elements used by applications which handled each document as best it could, perhaps
> with the aid of private schemas to check that all the information required was present.
> But truth is not atomic: a number may be complex, a quantity will have a unit as well
> as a value, a table has rows, love and marriage go together like a horse and carriage.

This is the crux of the matter. In the instance document, understood purely as WF syntax,
the data *is* effectively atomic and for that reason effectively useless. A particular
processor running a particular process in particular circumstances is required to
elaborate a useful instance data structure and, with it, locally useful semantics from
atomic data supplied by the document, together with other inputs appropriate for that
process. As Demokritos realized, there are only atoms and nothingness--all the rest is
opinion--until the moment that something happens, that something is done, which for that
particular purpose in those particular circumstances requires a particular instance
structure, or opinion of the appropriate form, relationships and, by elaboration,
semantics of that data.

Respectfully,

Walter Perry
Follow-Ups:
- Why XML? RE: Why the Infoset?
  - From: Jonathan Borden <jborden@mediaone.net>
References:
- Why the Infoset?
  - From: "Paul W. Abrahams" <abrahams@valinet.com>
- Re: Why the Infoset?
  - From: "Paul W. Abrahams" <abrahams@valinet.com>
- Re: Why the Infoset?
  - From: "W. E. Perry" <wperry@fiduciary.com>
- Re: Why the Infoset?
  - From: Rick JELLIFFE <ricko@geotempo.com>
Prev by Date: Re: Why the Infoset?
Next by Date: Fwd: "HOW TO RELAX" has been updated.
Previous by thread: Re: Why the Infoset?
Next by thread: Why XML? RE: Why the Infoset?
Index(es):
- Date
- Thread