Re: [xml-dev] Another way to present XML data

On Wed, Sep 13, 2017 at 7:31 PM, yamahito <yamahito@gmail.com> wrote:

The case I often find processors screwing up is:

underlined italic

Note the significant whitespace between the and 
On Tue, 12 Sep 2017 at 21:45 Michael Kay <mike@saxonica.com> wrote:
On 12 Sep 2017, at 21:14, jf.larvoire@free.fr wrote:

As I already answered to Michael Kay, this is HTML.

No, this is XML.

Google's KML is an excellent example, as are XSLT, SOAP, SVG, etc...
Of those three examples, two use mixed content:

XSLT:

<xsl:template match="cite">[<xsl:value-of select="."/>]</xsl:template>

SVG:
 <text x="200" y="150" fill="blue" >
 You are
 <tspan font-weight="bold" fill="red" >not</tspan>
 a banana.
 </text>
Michael Kay
Saxonica
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
 <Document>
 <Placemark>
 <name>New York City</name>
 <description>New York City</description>
 <Point>
 <coordinates>-74.006393,40.714172,0</coordinates>
 </Point>
 </Placemark>
 </Document>
</kml>
becomes
?xml version="1.0" encoding="UTF-8"
kml xmlns="http://www.opengis.net/kml/2.2" {
Document {
Placemark {
name "New York City"
description "New York City"
Point {
coordinates -74.006393,40.714172,0
}
}
}
}
De: "Christophe Marchand" <cmarchand@oxiane.com>
À: xml-dev@lists.xml.org
Envoyé: Mardi 12 Septembre 2017 20:56:47
Objet: Re: [xml-dev] Another way to present XML data

Maybe something like this :

This is a bold text with underlined italic content in.

Best,
Christophe

Le 12/09/2017 à 18:32, jf.larvoire@free.fr a écrit :

No, white spaces are significant, and are preserved by my conversion script.

I don't really know what you mean by mixed contents.

Please show me an example of mixed content that is not converted correctly.

Jean-François

De: "Michael Kay" <mike@saxonica.com>
À: "jf larvoire" <jf.larvoire@free.fr>
Cc: xml-dev@lists.xml.org
Envoyé: Mardi 12 Septembre 2017 17:49:14
Objet: Re: [xml-dev] Another way to present XML data

I actually came across this proposal a couple of months ago when trying to do something similar myself - before I found it I had in fact reinvented a lot of this, with minor variations.

The thing I was finding most difficult was mixed content, and I don't think SML has really thought that through either. None of the examples seem to use mixed content, at any rate. And as far as I can see, the whitespace between elements is being treated as insignificant, which doesn't work in a mixed-content world.

Of course this is really tough challenge because you want to round-trip XML but XML has this major design flaw that signfiicant and insignificant whitespace can't be lexically distinguished. So you end up either replicating that design flaw, or sacrificing round-tripping.

On a point of detail, it doesn't feel right to have both '&' and '\' as escape characters with different roles.

(Mixed content is not an edge case!)

Michael Kay

Saxonica

On 12 Sep 2017, at 15:31, jf.larvoire@free.fr wrote:

Following last month's thread about the relative merits of XML and JSON, I think there's a way to get the advantages of both:

Instead of creating yet another data format incompatible with XML, it's actually possible to transform _reversibly_ XML into something that is as human-friendly as JSON.
This way, we can present the same data as "normal" XML to be consumed programs, or as "simplified" XML to be reviewed and edited by humans. And convert it back and forth between the two presentations when going from programs to humans and back.
I developed several years ago a Tcl script that did exactly that.

The script, and my proposed "Simplified XML" format, aka. SML, were presented at the 2013 Tcl conference:
http://www.tclcommunityassociation.org/wub/proceedings/Proceedings-2013/JeanFrancoisLarvoire/A%20simpler%20and%20shorter%20representation%20of%20XML%20data%20inspired%20by%20Tcl.pdf
This script is open-sourced and available there:
https://github.com/JFLarvoire/SysToolsLib/blob/master/Tcl/sml.tcl
To use it in Windows, you'll need to install a Tcl interpreter. See instructions on this page if needed:
https://github.com/JFLarvoire/SysToolsLib/tree/master/Tcl

Note that I've been told that another data format called sml was proposed in 1999.
Mine has no relationship whatsoever with the other.
If this homonymy is a problem, I'm open to any alternative name!

Here's an example of what the transformation does:

`cat sample.xml`

<?xml version="1.0"?>

<formats>

<format name="XML">

<author>W3C</author>

<standardized>2008</standardized>

<advantage>Can define formal syntaxes, that can be verified</advantage>

<advantage>Widely adopted, many XML-based standards</advantage>

<drawback>Hard to read by humans</drawback>

</format>

<format name="JSON">

<author>Douglas Crockford</author>

<standardized>2013</standardized>

<advantage>Easy to read by humans</advantage>

<drawback>Incompatible with XML</drawback>

</format>

<format name="SML">

<author>Jean-Francois larvoire</author>

<advantage>Same advantages as XML. It is XML presented differently.</advantage>

<advantage>Easy to read by humans</advantage>

<drawback>No I/O libraries available yet</drawback>

</format>

</formats>

Then `cat sample.xml | sml` displays:

?xml version="1.0"

formats {

format name="XML" {

author W3C

standardized 2008

advantage "Can define formal syntaxes, that can be verified"

advantage "Widely adopted, many XML-based standards"

drawback "Hard to read by humans"

}

format name="JSON" {

author "Douglas Crockford"

standardized 2013

advantage "Easy to read by humans"

drawback "Incompatible with XML"

}

format name="SML" {

author "Jean-Francois larvoire"

advantage "Same advantages as XML. It is XML presented differently."

advantage "Easy to read by humans"

drawback "No I/O libraries available yet"

}

}

And finally `cat sample.xml | sml | sml` outputs an identical copy of the initial XML file.

Try it out with any of your XML files, and you'll be surprised of how much easier it is to edit them!
XSLT files transformed this way even become pleasant for C programmers!

Of course, there's nothing that prevents programs from actually producing or consuming SML directly. There are no libraries available for doing it yet, but in simple cases as in the example above, the parsing is relatively trivial.
The nice thing is that this transition can be done progressively, as the new SML-aware programs will remain compatible with your old programs that only know about standard XML.
Simply pipe the data through the sml.tcl script to make them understand each other!

Limitations: The sml.tcl program is well tested (I've used it for years), but it has known limitations:
- It will not convert UTF-16 or EBCDIC or MBCS files correctly.
- I've done very little testing with Unicode characters > \u00FF, so I'm not sure it will work fine with these in UTF-8 files.
- I've tested the reversibility with the whole libxml2 test suite. Still, my parsing definitely does not cover all corner cases of the XML specification. There surely are bugs still hiding there, but hopefully only for the least used features of XML.

Any feedback welcome!

Please report any bug on the github issues list:
https://github.com/JFLarvoire/SysToolsLib/issues