Ack.
I was presenting the short form because the long forms of the argument has been presented so many times in this forum that the real silliness is we are doing it again with mostly the same partners in the conversation.
Essentially, for lack of a better term, unless there are contracts of some sort the system is in a caveat emptor mode of operation. A schema/DTD for some level of detail and complexity can make that better given all the tradeoffs of system types and costs. There are other and complementary or redundant means. The same arguments apply to JSON as to XML and if the XML systems do this better for some n of costs, XML is better. Otherwise, style, taste, affiliation, historical contexts, yadda yadda.
Berners-Lee is a theoretician. Practical production requires more sensible guidance than he is willing to afford. Not exactly news at 11. Liam is a better resource for that.
len
-----Original Message-----
Ghost? Boo! I have four other angles: 1) Test driven development. Before=as=so-soon-after-that-noone-notices you make some software, you make a test for it. If the document has a fixed structure, you can test by instances. If the document is semi-structured or recursive, your test specification has to allow those kinds of structures too: and for XML such a specification is called a schema.
2) Quality assurance. I work in a company with a globally distributed development and production system: (it is so big that US content architects may forget they have brother content architects in other countries when casually posting :-). We need mechanisms to know whether foreign or domestic, in-house or outsourced editorial or production staff is working to specification. In order to do that, we need some kind of specifications and way to test them: and again, such as specification is called a schema. We have used these to get down to Six Sigma levels of quality: how can you get down to 3.4 failures per million opportunities? 3) Conway's Law. A successful system must have sub-system boundaries that match the organization. Formalizing a boundary that matches internal organizational boundaries helps reduce communication costs. Formalizing a boundary within a team needs to allow flexibility, agility, otherwise it will get in the way.
4) Length of the workflow. If data is going from A-B, there is less reason for a schema. If data is going from A to B to C to A to D to E to F to B to G, then you start to have big coordination costs. And opportunities for errors made early not to be caught till late: a fatal flaw in many cases. The schema is the contract and documentation.
Where
I am closer to Simon's view than perhaps others may be, is that when we look at
the above 4 motivating cases for using schemas, then very often we do have some
known process involved. So in fact static tests of the input and output
documents are necessary but not sufficient: we need to check the input or
output document against facts in the outside world that may be controlled by
others or be dynamic (e.g., code lists) and we may need to check that
transformations have indeed preserved information (e.g. if there are 5 paras in
the input there should be 5 equivalent paras in the output).
We need to make sure we don't use specific schemas where generic ones are more
appropriate (e.g. simpler schemas with attribute values checked by other layers
of validation, or partial schemas for envelopes and payloads) and that we
have enough capability to convert generic to specific when tools require it.. Where I would disagree with Simon, I think, is that I
think the advent of JSON for point-to-point interchange actually means that
probably you should always use a schema with XML: if you don't need a
schema perhaps you should be using JSON?
Cheers
On Wed, Apr 10, 2013 at 12:36 PM, Len Bullard <cbullard@hiwaay.net> wrote: Ghosts don’t speak unless spoken to.
Schemas/DTDs shine in environments where the generators are humans who are lazy, forgetful and/or badly trained. At my last job of the 12 people working for me, only 1 could read a DTD and all claimed to be XML experts in an application with a large and well-crafted DTD. Being the only other person on the hall who could, it was in no way job security nor did it broker data goodness. They doggedly tagged 1500 work packages to the wrong part of the tree because the style sheet didn’t care, the customer didn’t look, and the files validated and ran in the system they were targeted to.
Contracts have to care. If they don’t then the humans won’t.
len
-----Original Message-----
How come Len Bullard doesn't kick in? This is so much about control.
On Wed, Apr 10, 2013 at 12:48 AM, Michael Sokolov <msokolov@safaribooksonline.com> wrote: On 4/9/13 5:20 PM, Simon St.Laurent wrote: On 4/9/13 5:16 PM, Toby Considine wrote: Sorry
Might be interesting as a straw man. I only
became aware of this piece of work at its tail end (years after any attempt at
standardization was abandoned, I think), when I guided some agonized engineers
through an implementation of a soap service in perl that was supposed to provide
services to a Microsoft .NET consumer: these two software packages used
completely antagonistic approaches, as far as I could tell. The
"standards" were worse than useless; they should have been called web
services inoperability. There are at least two, maybe three completely
different interpretations of the SOAP vocabulary based on fundamentally
different conceptions of how to deliver web services, all masquerading under
the same heading of WS-I. There are layers of incomprehensible service
endpoint babbledygook that makes reading the actual markup nearly impossible:
it might as well be a binary format for all the benefit one gets from XML in
this arena. The current situation is that the only rational way to use SOAP is
to use two endpoints from the same provider, and never ever to look under the
covers at the XML that is being generated for you. At least that's how it
seemed to me as an infrequent user - I don't claim to be an expert. This
particular piece of software is the tar baby of our organization - touch it at
your peril.
|