Lists Home |
Date Index |
1/14/2002 2:08:03 PM, Nicolas LEHUEN
I wouldn't diagree on the "self describing" bit; tags are
just labels that have to refer to something else that defines
their semantics. My point vis a vis CSV was simply that a
tag is a lot better than nothing (or a header somewhere far
away) when you're debugging.
> Any given XML document requires a schema, and not only for
> validation....an XML application has to rely on an implicit
> or explicit schema to process XML documents meaningfully,
> i.e. at the semantic level, because it is the schema that
> creates the document semantics.
> Well-formedness alone is a lure ... If you don't write the
> schema explicitely, its ghost will appear in your programs
> anyway, created by the assumption the program has to make
> to run properly.
THIS is the kind of thing I had in mind when I referred to us
talking past each other on xml-dev <grin>
I guess I disagree about the *general* applicability of the
situation that Nicolas Lehuen describes. Simon put it quite
nicely (emphasis and parenthetical notes added) :
"...accept that information may not *always* come in
precisely the same structure. [when it doesn't] Write code
which supports flexibility rather than demanding conformity.
[you can] Throw away notions of strict conformance to
semantical notions - rely only on syntactical conformance."
In a loosely coupled application you may know very little
about the data other than it is well-formed XML, and the job
of an application component is to extract whatever
information APPEARS to match the patterns it is looking for,
put the information in a more useable form, and pass it down
the pipeline for further processing. A network of these
simple components can do some quite interesting things, and
tools such as Sean McGrath's XPipe stuff and Software AG's
EntireX Orchestrator are becoming available to develop them.
This is a very different way of looking at XML (and data
processing for that matter) than the object-centric or
schema-centric approach. It solves one problem -- the lack
of authoritative schema for many application domains -- by
accepting a lot more chaos and error than many might find
tolerable. In any real system, there would have to be humans
involved to make sure that that purchase order that looks
like the deal of a lifetime is indeed what the pattern
matcher thought it was and not a joke, a fraud, or something
else entirely. But at least they won't reject the purchase
order of a lifetime because it had an extra <p> tag
somewhere. <grin, yeah I know this is a contrived example!>
More seriously, this is a way to exploit what order there is
in the system, i.e., an <invoice> tag probably refers to
something resembling an "invoice", without insisting on total
This is not to say that this "loose" approach is the best;
it's certainly not when you CAN authoritatively specify fixed
schemas and reject messages/documents that don't match them.
But it's better than handwringing about how XML can only be
used once everyone agrees on a schema for some particular
industry, as we see so often in the trade press.