XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Another way to present XML data

I see.

What you call mixed content is basically what you have in HTML files.
My script will convert XHTML reversibly, including with lots of complex mixed contents cases. (I've tested it successfully with the XML specification itself :-) )
But it will not convert plain HTML reversibly.

Also my SML format is optimized for XML data sets, formatted in the canonic way.
In this case, it'll be really simple, as shown in my first message example.
If there are multiple elements on the same line, or worse still mixed content, then the converted files aren't as simple, and the benefit is less important.

Now to answer your questions, sml.tcl converts...

<formats>
    <format name="XML">
        <author>W<sub>3</sub>C</author>

... to: 

formats {
    format name="XML" {
        author {"W";sub 3;"C"}}}

And... 

<formats>
    <format name="XML">
author<author>W3C</author>

... to: 

formats {
    format name="XML" {
        "author";author W3C}}

If you want to try other more complex cases, and check the reversibility for yourself, I encourage you to download the script and give it a try.
In Linux, it will run directly. (Rename it as just "sml" to make it shorter to type.)
In windows, there's an interpreter to install first, but hopefully the instructions mentioned in my first mail should make it easy.


De: "Michael Kay" <mike@saxonica.com>
À: "jf larvoire" <jf.larvoire@free.fr>
Cc: xml-dev@lists.xml.org
Envoyé: Mardi 12 Septembre 2017 19:02:03
Objet: Re: [xml-dev] Another way to present XML data

I haven't tried the conversion tool, I've only been looking in the spec.

Mixed content is essentially an element node that has both element and text node children.

You convert 

<formats>
    <format name="XML">
<author>W3C</author>

to

formats {
    format name="XML" {
author W3C

so how would you convert

<formats>
    <format name="XML">
<author>W<sub>3</sub>C</author>
and what would happen to:

<formats>
    <format name="XML">
author<author>W3C</author>
Michael Kay
Saxonica

On 12 Sep 2017, at 17:32, jf.larvoire@free.fr wrote:

No, white spaces are significant, and are preserved by my conversion script.

I don't really know what you mean by mixed contents.
Please show me an example of mixed content that is not converted correctly.

Jean-François


De: "Michael Kay" <mike@saxonica.com>
À: "jf larvoire" <jf.larvoire@free.fr>
Cc: xml-dev@lists.xml.org
Envoyé: Mardi 12 Septembre 2017 17:49:14
Objet: Re: [xml-dev] Another way to present XML data

I actually came across this proposal a couple of months ago when trying to do something similar myself - before I found it I had in fact reinvented a lot of this, with minor variations.

The thing I was finding most difficult was mixed content, and I don't think SML has really thought that through either. None of the examples seem to use mixed content, at any rate. And as far as I can see, the whitespace between elements is being treated as insignificant, which doesn't work in a mixed-content world.

Of course this is really tough challenge because you want to round-trip XML but XML has this major design flaw that signfiicant and insignificant whitespace can't be lexically distinguished. So you end up either replicating that design flaw, or sacrificing round-tripping.

On a point of detail, it doesn't feel right to have both '&' and '\' as escape characters with different roles.

(Mixed content is not an edge case!)

Michael Kay
Saxonica


On 12 Sep 2017, at 15:31, jf.larvoire@free.fr wrote:

Following last month's thread about the relative merits of XML and JSON, I think there's a way to get the advantages of both:

Instead of creating yet another data format incompatible with XML, it's actually possible to transform _reversibly_ XML into something that is as human-friendly as JSON.
This way, we can present the same data as "normal" XML to be consumed programs, or as "simplified" XML to be reviewed and edited by humans. And convert it back and forth between the two presentations when going from programs to humans and back.
I developed several years ago a Tcl script that did exactly that.

The script, and my proposed "Simplified XML" format, aka. SML, were presented at the 2013 Tcl conference:
http://www.tclcommunityassociation.org/wub/proceedings/Proceedings-2013/JeanFrancoisLarvoire/A%20simpler%20and%20shorter%20representation%20of%20XML%20data%20inspired%20by%20Tcl.pdf
This script is open-sourced and available there:
https://github.com/JFLarvoire/SysToolsLib/blob/master/Tcl/sml.tcl
To use it in Windows, you'll need to install a Tcl interpreter. See instructions on this page if needed:
https://github.com/JFLarvoire/SysToolsLib/tree/master/Tcl

Note that I've been told that another data format called sml was proposed in 1999.
Mine has no relationship whatsoever with the other.
If this homonymy is a problem, I'm open to any alternative name!

Here's an example of what the transformation does:

`cat sample.xml`

<?xml version="1.0"?>
<formats>
    <format name="XML">
<author>W3C</author>
<standardized>2008</standardized>
<advantage>Can define formal syntaxes, that can be verified</advantage>
<advantage>Widely adopted, many XML-based standards</advantage>
<drawback>Hard to read by humans</drawback>
    </format>
    <format name="JSON">
<author>Douglas Crockford</author>
<standardized>2013</standardized>
<advantage>Easy to read by humans</advantage>
<drawback>Incompatible with XML</drawback>
    </format>
    <format name="SML">
<author>Jean-Francois larvoire</author>
<advantage>Same advantages as XML. It is XML presented differently.</advantage>
<advantage>Easy to read by humans</advantage>
        <drawback>No I/O libraries available yet</drawback>
    </format>
</formats>

Then `cat sample.xml | sml` displays:

?xml version="1.0"
formats {
    format name="XML" {
author W3C
standardized 2008
advantage "Can define formal syntaxes, that can be verified"
advantage "Widely adopted, many XML-based standards"
drawback "Hard to read by humans"
    }
    format name="JSON" {
author "Douglas Crockford"
standardized 2013
advantage "Easy to read by humans"
drawback "Incompatible with XML"
    }
    format name="SML" {
author "Jean-Francois larvoire"
advantage "Same advantages as XML. It is XML presented differently."
advantage "Easy to read by humans"
        drawback "No I/O libraries available yet"
    }
}

And finally `cat sample.xml | sml | sml` outputs an identical copy of the initial XML file.

Try it out with any of your XML files, and you'll be surprised of how much easier it is to edit them!
XSLT files transformed this way even become pleasant for C programmers!

Of course, there's nothing that prevents programs from actually producing or consuming SML directly. There are no libraries available for doing it yet, but in simple cases as in the example above, the parsing is relatively trivial.
The nice thing is that this transition can be done progressively, as the new SML-aware programs will remain compatible with your old programs that only know about standard XML.
Simply pipe the data through the sml.tcl script to make them understand each other!

Limitations: The sml.tcl program is well tested (I've used it for years), but it has known limitations:
- It will not convert UTF-16 or EBCDIC or MBCS files correctly.
- I've done very little testing with Unicode characters > \u00FF, so I'm not sure it will work fine with these in UTF-8 files.
- I've tested the reversibility with the whole libxml2 test suite. Still, my parsing definitely does not cover all corner cases of the XML specification. There surely are bugs still hiding there, but hopefully only for the least used features of XML.

Any feedback welcome!

Please report any bug on the github issues list:
https://github.com/JFLarvoire/SysToolsLib/issues








[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS