XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Another way to present XML data

Following last month's thread about the relative merits of XML and JSON, I think there's a way to get the advantages of both:

Instead of creating yet another data format incompatible with XML, it's actually possible to transform _reversibly_ XML into something that is as human-friendly as JSON.
This way, we can present the same data as "normal" XML to be consumed programs, or as "simplified" XML to be reviewed and edited by humans. And convert it back and forth between the two presentations when going from programs to humans and back.
I developed several years ago a Tcl script that did exactly that.

The script, and my proposed "Simplified XML" format, aka. SML, were presented at the 2013 Tcl conference:
http://www.tclcommunityassociation.org/wub/proceedings/Proceedings-2013/JeanFrancoisLarvoire/A%20simpler%20and%20shorter%20representation%20of%20XML%20data%20inspired%20by%20Tcl.pdf
This script is open-sourced and available there:
https://github.com/JFLarvoire/SysToolsLib/blob/master/Tcl/sml.tcl
To use it in Windows, you'll need to install a Tcl interpreter. See instructions on this page if needed:
https://github.com/JFLarvoire/SysToolsLib/tree/master/Tcl

Note that I've been told that another data format called sml was proposed in 1999.
Mine has no relationship whatsoever with the other.
If this homonymy is a problem, I'm open to any alternative name!

Here's an example of what the transformation does:

`cat sample.xml`

<?xml version="1.0"?>
<formats>
    <format name="XML">
<author>W3C</author>
<standardized>2008</standardized>
<advantage>Can define formal syntaxes, that can be verified</advantage>
<advantage>Widely adopted, many XML-based standards</advantage>
<drawback>Hard to read by humans</drawback>
    </format>
    <format name="JSON">
<author>Douglas Crockford</author>
<standardized>2013</standardized>
<advantage>Easy to read by humans</advantage>
<drawback>Incompatible with XML</drawback>
    </format>
    <format name="SML">
<author>Jean-Francois larvoire</author>
<advantage>Same advantages as XML. It is XML presented differently.</advantage>
<advantage>Easy to read by humans</advantage>
        <drawback>No I/O libraries available yet</drawback>
    </format>
</formats>

Then `cat sample.xml | sml` displays:

?xml version="1.0"
formats {
    format name="XML" {
author W3C
standardized 2008
advantage "Can define formal syntaxes, that can be verified"
advantage "Widely adopted, many XML-based standards"
drawback "Hard to read by humans"
    }
    format name="JSON" {
author "Douglas Crockford"
standardized 2013
advantage "Easy to read by humans"
drawback "Incompatible with XML"
    }
    format name="SML" {
author "Jean-Francois larvoire"
advantage "Same advantages as XML. It is XML presented differently."
advantage "Easy to read by humans"
        drawback "No I/O libraries available yet"
    }
}

And finally `cat sample.xml | sml | sml` outputs an identical copy of the initial XML file.

Try it out with any of your XML files, and you'll be surprised of how much easier it is to edit them!
XSLT files transformed this way even become pleasant for C programmers!

Of course, there's nothing that prevents programs from actually producing or consuming SML directly. There are no libraries available for doing it yet, but in simple cases as in the example above, the parsing is relatively trivial.
The nice thing is that this transition can be done progressively, as the new SML-aware programs will remain compatible with your old programs that only know about standard XML.
Simply pipe the data through the sml.tcl script to make them understand each other!

Limitations: The sml.tcl program is well tested (I've used it for years), but it has known limitations:
- It will not convert UTF-16 or EBCDIC or MBCS files correctly.
- I've done very little testing with Unicode characters > \u00FF, so I'm not sure it will work fine with these in UTF-8 files.
- I've tested the reversibility with the whole libxml2 test suite. Still, my parsing definitely does not cover all corner cases of the XML specification. There surely are bugs still hiding there, but hopefully only for the least used features of XML.

Any feedback welcome!

Please report any bug on the github issues list:
https://github.com/JFLarvoire/SysToolsLib/issues



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS