XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: bigger church? (was software lock-in)

On 03/23/2017 11:57 AM, Dave Pawson wrote:

P.S.: SGML was "too hard", too.  The transformation into XML involved
shedding features that, in retrospect, were solutions to problems that had
once been considered compelling.
And that excludes a shift (for some?) to json Steve?

A call for a broader church perhaps?

"Yes" on the broader church, "no" on sacrificing consistency. Consistency is such an important meta-issue that it applies to meta-ness itself. The discussion we seem to be having is about the nature of the consistency of the XML mindset, both as it is, and as it might be. Is the notion of vendor lock-in essential to that consistency, or not? For me, it is. For others, evidently not.

About JSON and XML:

JSON has interoperability issues that can lead to vendor lock-in, but I sure do use it. Indeed, it's hard to imagine not using JSON. Just as hard as imagining not using XML, really.

I could unreservedly recommend JSON if it always worked the same way everywhere, but even then, not for all purposes. Meaningful work remains to be done on that problem. It's also just too quirky for my taste. Not being able to use a comma after the last item in a list is truly suboptimal. What harm would a final comma do, other than making it easier to edit and maintain raw hard-coded JSON? It's just crazy that JSON parsers can trip over that.

Even if it worked everywhere the same way, JSON still wouldn't compete with XML, and for fundamental reasons. JSON *can* handle mixed content, but it's truly awkward to work with mixed content in raw JSON.

The rise of JSON was predictable enough, I think, once it became clear that in XML it was expected that whitespace would be inserted in data content just for the sake of the readability of the raw XML. That's a profoundly disagreeable idea for most programmers, and as a programmer I'm sympathetic. The expectation of added/changed whitespace has really troubled me. It was an aspect of the practice of SGML that should not have survived in XML. I think it survived only because existing SGML users couldn't imagine living without it, and they were not buying the W3C line that nobody would ever work with raw XML, namely that tools were what XML was all about. (Remembering that line still makes my blood pressure rise. Today it reminds me of Trump's florid assurances that, with him in charge, everything will be just great, very very great. People who had already adopted SGML were never going to buy the "just buy a tool" argument. They'd already been there, been burned to a crisp, and they had already paid the price of switching to SGML.)

From my perspective, both JSON's final comma infelicity and the expectation that XML applications will not be faithful to every character found in mixed content were failures to achieve the best possible balance between raw readability/maintainability, on the one hand, and guaranteed fidelity of parsed data to unparsed data in mixed content, on the other hand.

In JSON it shouldn't matter whether that final comma is there or not, but at least with JSON, whatever is expressed as string data presumably always corresponds character-for-character to what the string data are presumably intended to be.

In XML I should be able to format the raw XML of mixed content any way I like, as long as I preserve the distinction between data (which is sacred) and markup (which is *not* sacred). But in practice, raw XML is quite lame; the markup positively resists formatting. As things stand, and given the fact that I can't introduce whitespace into markup in one extremely sensitive position (see below) a typical, maintainable, eye-parsable XHTML file that I produce takes some practice to read. But at least it means exactly what it says. Here's a sample fragment (best viewed in monospace):

><div class="p"
id="TXMP6dfd8d0c"
>
If an agent is signing your return for you, a power of attorney (POA) must be
filed. Attach the POA to Form 8453 and file it using that form's instructions.
See
<a class="crossref"
href="../pub17/p17-006.htm#en_us_publink1000170600"
id="en_us_publink1000170491"
><i
>Signatures</i
></a
>, later, for more information on POAs.</div
><a name="en_us_publink1000170492"
></a
><a class="persistentid"
>taxmap/pub17/p17-004.htm#en_us_publink1000170492</a
><a name="TXMP7ce5fbad"
></a
><table border="0"
cellpadding="0"
cellspacing="0"
class="headertbl"
><tbody

I don't know anybody in XML-land who is quite as persnickety as I am about this, but it seems to me that it's just good practice to hold data sacrosanct. Expectations regarding the requirements of applications should not be mixed up in raw XML's readability requirements. And they *are* mixed up. The above example could be made more a lot more readable if XML did not disallow the whitespace that SGML allows after the STAGO ("<") and ETAGO ("</") strings. If I could do that, then every sequence of lines of data could have exactly two markup sequences, one TAGC (">") at the beginning of a sequence of lines of data content, and one at the end (STAGO or ETAGO). It would be both simpler to format and clearer to read, even if every data character were exactly the one originally intended.

Here's the nub of it. If you look closely at the above example, you'll find what SGML used to call a "record end" at the very beginning of the content of the <div>, and another one after the word "See". These record ends are there only because I'm being faithful to the original XML data from which this XHTML was made. I suspect the record ends were added to the original XML by the tool that was used to create this XML content in the first place. I suspect that the human being who presumably wrote these data wasn't even aware of their existence, so they are record-ends that nobody intended. Indeed, I suspect the writer typed a space after the "See" and the tool converted that space into a record-end. The record ends qua record ends certainly have no purpose here, other than showing exactly what the data were as I received them.

Am I the only person on this list who's bothered by those record ends? I very sincerely believe that markup tools should never decide what the data are. Today, if I need to be sure that data will *not* be tweaked by some tool somewhere, I simply can't use XML to interchange it, because if I do I'm asking for trouble. I'll just have to use something else, despite W3C's historical assurances that XML is all about machine-to-machine communications, and only the tools matter. As for JSON, I can't imagine any general-purpose JSON tool being intended to feel free to tweak anyone's data in any way, no matter how limited that way might be. In view of that profound cultural difference, and the huge increases in the numbers of people who are in the web services business and have programming responsibilities, it's not so surprising that JSON is eating XML's lunch, is it?

But XML's market-share-losing blunders are not nearly as important as, well, what's next? I'm pointing out that vendor lock-in is still a big problem, it's about to get orders of magnitude bigger, and the XML community should have something to say about it. We, no *you*, dear reader, might have something important to say about it. (Or not, which is also kind of interesting.)

Steve









[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS