Following last month's
thread about the relative merits of XML
and JSON, I think there's a way to get the
advantages of both:
Instead of creating yet another
data format incompatible with XML, it's
actually possible to transform
_reversibly_ XML into something that is as
human-friendly as JSON.
This way, we can present the same
data as "normal" XML to be consumed
programs, or as "simplified" XML to be
reviewed and edited by humans. And convert
it back and forth between the two
presentations when going from programs to
humans and back.
I developed several years ago a Tcl
script that did exactly that.
The script, and my proposed
"Simplified XML" format, aka. SML, were
presented at the 2013 Tcl conference:
http://www.tclcommunityassociation.org/wub/proceedings/Proceedings-2013/JeanFrancoisLarvoire/A%20simpler%20and%20shorter%20representation%20of%20XML%20data%20inspired%20by%20Tcl.pdf
This script is open-sourced and
available there:
https://github.com/JFLarvoire/SysToolsLib/blob/master/Tcl/sml.tcl
To use it in Windows, you'll need
to install a Tcl interpreter. See
instructions on this page if needed:
https://github.com/JFLarvoire/SysToolsLib/tree/master/Tcl
Note that I've been told that
another data format called sml was
proposed in 1999.
Mine has no relationship whatsoever
with the other.
If this homonymy is a problem, I'm
open to any alternative name!
Here's an example of what the
transformation does:
`cat sample.xml`
<?xml version="1.0"?>
<formats>
<format
name="XML">
<author>W3C</author>
<standardized>2008</standardized>
<advantage>Can
define formal syntaxes, that can be
verified</advantage>
<advantage>Widely
adopted, many XML-based
standards</advantage>
<drawback>Hard
to read by humans</drawback>
</format>
<format
name="JSON">
<author>Douglas
Crockford</author>
<standardized>2013</standardized>
<advantage>Easy
to read by humans</advantage>
<drawback>Incompatible
with XML</drawback>
</format>
<format
name="SML">
<author>Jean-Francois
larvoire</author>
<advantage>Same
advantages as XML. It is XML presented
differently.</advantage>
<advantage>Easy
to read by humans</advantage>
<drawback>No
I/O libraries available
yet</drawback>
</format>
</formats>
Then `cat sample.xml |
sml` displays:
?xml
version="1.0"
formats {
format
name="XML" {
author
W3C
standardized
2008
advantage
"Can define formal syntaxes, that can be
verified"
advantage
"Widely adopted, many XML-based
standards"
drawback
"Hard to read by humans"
}
format
name="JSON" {
author
"Douglas Crockford"
standardized
2013
advantage
"Easy to read by humans"
drawback
"Incompatible with XML"
}
format
name="SML" {
author
"Jean-Francois larvoire"
advantage
"Same advantages as XML. It is XML
presented differently."
advantage
"Easy to read by humans"
drawback
"No I/O libraries available yet"
}
}
And finally `cat sample.xml |
sml | sml` outputs an identical
copy of the initial XML file.
Try it out with any of your XML
files, and you'll be surprised of how much
easier it is to edit them!
XSLT files transformed this way
even become pleasant for C programmers!
Of course, there's nothing that
prevents programs from actually producing
or consuming SML directly. There are no
libraries available for doing it yet, but
in simple cases as in the example above,
the parsing is relatively trivial.
The nice thing is that this
transition can be done progressively, as
the new SML-aware programs will remain
compatible with your old programs that
only know about standard XML.
Simply pipe the data through the
sml.tcl script to make them understand
each other!
Limitations: The sml.tcl program is
well tested (I've used it for years), but
it has known limitations:
- It will not convert UTF-16 or
EBCDIC or MBCS files correctly.
- I've done very little testing
with Unicode characters > \u00FF, so
I'm not sure it will work fine with these
in UTF-8 files.
- I've tested the reversibility
with the whole libxml2 test suite. Still,
my parsing definitely does not cover all
corner cases of the XML specification.
There surely are bugs still hiding there,
but hopefully only for the least used
features of XML.
Any feedback welcome!