RE: Fwd: [xml-dev] Not using mixed content? Then don't use XML

Ack.

I was presenting the short form because the long forms of the argument has been presented so many times in this forum that the real silliness is we are doing it again with mostly the same partners in the conversation.

Essentially, for lack of a better term, unless there are contracts of some sort the system is in a caveat emptor mode of operation. A schema/DTD for some level of detail and complexity can make that better given all the tradeoffs of system types and costs. There are other and complementary or redundant means. The same arguments apply to JSON as to XML and if the XML systems do this better for some n of costs, XML is better. Otherwise, style, taste, affiliation, historical contexts, yadda yadda.

Berners-Lee is a theoretician. Practical production requires more sensible guidance than he is willing to afford. Not exactly news at 11. Liam is a better resource for that.

len

-----Original Message-----
From: Rick Jelliffe [mailto:rjelliffe@allette.com.au]
Sent: Wednesday, April 10, 2013 12:23 AM
To: Len Bullard
Cc: Peter Ring; xml-dev@lists.xml.org
Subject: Re: Fwd: [xml-dev] Not using mixed content? Then don't use XML

Ghost? Boo!

I have four other angles:

1) Test driven development. Before=as=so-soon-after-that-noone-notices you make some software, you make a test for it. If the document has a fixed structure, you can test by instances. If the document is semi-structured or recursive, your test specification has to allow those kinds of structures too: and for XML such a specification is called a schema.

2) Quality assurance. I work in a company with a globally distributed development and production system: (it is so big that US content architects may forget they have brother content architects in other countries when casually posting :-). We need mechanisms to know whether foreign or domestic, in-house or outsourced editorial or production staff is working to specification. In order to do that, we need some kind of specifications and way to test them: and again, such as specification is called a schema. We have used these to get down to Six Sigma levels of quality: how can you get down to 3.4 failures per million opportunities?

3) Conway's Law. A successful system must have sub-system boundaries that match the organization. Formalizing a boundary that matches internal organizational boundaries helps reduce communication costs. Formalizing a boundary within a team needs to allow flexibility, agility, otherwise it will get in the way.

4) Length of the workflow. If data is going from A-B, there is less reason for a schema. If data is going from A to B to C to A to D to E to F to B to G, then you start to have big coordination costs. And opportunities for errors made early not to be caught till late: a fatal flaw in many cases. The schema is the contract and documentation.

Where I am closer to Simon's view than perhaps others may be, is that when we look at the above 4 motivating cases for using schemas, then very often we do have some known process involved. So in fact static tests of the input and output documents are necessary but not sufficient: we need to check the input or output document against facts in the outside world that may be controlled by others or be dynamic (e.g., code lists) and we may need to check that transformations have indeed preserved information (e.g. if there are 5 paras in the input there should be 5 equivalent paras in the output). We need to make sure we don't use specific schemas where generic ones are more appropriate (e.g. simpler schemas with attribute values checked by other layers of validation, or partial schemas for envelopes and payloads) and that we have enough capability to convert generic to specific when tools require it..

This is where Schematron's approach (but not DTD, XML Schemas 1.1, RELAXING, and Examplotron miss out. I don't know about CAM here) shines. A little bit of TDD/QA/Conway/cost-reduction goes a long way, but a little bit is all the grammar-based schema languages provide.

Where I would disagree with Simon, I think, is that I think the advent of JSON for point-to-point interchange actually means that probably you should always use a schema with XML: if you don't need a schema perhaps you should be using JSON?

Actually, that is too much: what trumps often is how easy a format is to fit into your current ecosystem and capabilities: if you are sending data from an OO servlet to JavaScript, maybe JSON will fit best even if you have TDD and QA etc issues. But if you already have XML ecosystem and capabilities, then maybe XML Schema is the best choice even if JSON would be terser.

Cheers
Rick Jelliffe

On Wed, Apr 10, 2013 at 12:36 PM, Len Bullard <cbullard@hiwaay.net> wrote:

Ghosts don’t speak unless spoken to.

Schemas/DTDs shine in environments where the generators are humans who are lazy, forgetful and/or badly trained. At my last job of the 12 people working for me, only 1 could read a DTD and all claimed to be XML experts in an application with a large and well-crafted DTD. Being the only other person on the hall who could, it was in no way job security nor did it broker data goodness. They doggedly tagged 1500 work packages to the wrong part of the tree because the style sheet didn’t care, the customer didn’t look, and the files validated and ran in the system they were targeted to.

Contracts have to care. If they don’t then the humans won’t.

len

-----Original Message-----
From: Peter Ring [mailto:peter.ring@texo.dk]
Sent: Tuesday, April 09, 2013 6:17 PM
To: xml-dev@lists.xml.org
Subject: Re: Fwd: [xml-dev] Not using mixed content? Then don't use XML

How come Len Bullard doesn't kick in? This is so much about control.

On Wed, Apr 10, 2013 at 12:48 AM, Michael Sokolov <msokolov@safaribooksonline.com> wrote:

On 4/9/13 5:20 PM, Simon St.Laurent wrote:

On 4/9/13 5:16 PM, Toby Considine wrote:

Sorry

WS-Interoperability.

Originally an industry consortium, no an OASIS specification

That was what I was afraid of. WS-*, aka the Death Star, was pretty much the ultimate purveyor of the worst practices that schemas encourage.

I would prefer to take guidance from other quarters.

Thanks,

Might be interesting as a straw man. I only became aware of this piece of work at its tail end (years after any attempt at standardization was abandoned, I think), when I guided some agonized engineers through an implementation of a soap service in perl that was supposed to provide services to a Microsoft .NET consumer: these two software packages used completely antagonistic approaches, as far as I could tell. The "standards" were worse than useless; they should have been called web services inoperability. There are at least two, maybe three completely different interpretations of the SOAP vocabulary based on fundamentally different conceptions of how to deliver web services, all masquerading under the same heading of WS-I. There are layers of incomprehensible service endpoint babbledygook that makes reading the actual markup nearly impossible: it might as well be a binary format for all the benefit one gets from XML in this arena. The current situation is that the only rational way to use SOAP is to use two endpoints from the same provider, and never ever to look under the covers at the XML that is being generated for you. At least that's how it seemed to me as an infrequent user - I don't claim to be an expert. This particular piece of software is the tar baby of our organization - touch it at your peril.

I'm sure this is old news for many of you (or else it's a sore spot and I've just completely offended you), but it might be interesting, Simon, to explore the part that schema-oriented thinking played in this? Or perhaps it was just a case of a poorly-run committee, and no other inferences can be drawn, I don't know. I wonder though if this isn't actually the dark well from which a lot of anti-XML sentiment springs. It's clearly in the web service transport layer that JSON really seems to shine, with its low-impedance match to programming language data structures, and its lack of impenetrable non-standard standards.

-Mike

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php