Re: [xml-dev] RE: XML Mantra

The "mantra" matches also my experience - recurring in almost every project and for many years. The challenge is to approach a genuine understanding of what difference it makes whether or not we bring XML into play.

The phenomenon: (1) reduced cost while achieving a desired result; (2) increased value of the artifacts created on the way. But what properties of XML are responsible for the phenomenon? I believe we have hardly begun to understand the principles behind it, we only observe the effects.

Not an explanation, but an image: XML encoded information is charged with potential energy (reduced entropy) which we can tap & turn into work. To turn non-XML into XML is like pumping energy into the input material which is henceforth at our disposal. (Rephrasing the mantra: "Maximize the phase in which to profit from XML potential energy.") Rule of thumb: to get from A to B, the effort critically depends on whether A is XML or not, and little depends on the format of B. (A *should* be XML; B may be anything).

~ ~ ~

I want to demonstrate the phenomenon with a small, but complete example.

Input: CSV data

Output: a text report rendering search results on the input data.

Pattern: XML is not involved in the problem; XML is the solution

Consider these CSV data about drivers (name, age, city):

===============================

michael;62;Reading
bill;;
andy;32;London
tom;28;Reading
chris;;Reading
paul;41;

===============================

Consider this desired result:

===============================

age > 30         : michael andy paul
city = Reading : michael tom chris
lacking: city     : bill paul
only: name      : bill
===============================

An XML-based solution:

(a) Use XQuery to turn CSV into XML (8 lines of code)

(b) Use XQuery to produce the report (6 lines of code)

The following XQuery code creates the complete report (including four queries):

===============================

string(<_>
age > 30     : { //driver[age/number(.) gt 30]/name/string() }
city = Reading : { //driver[city eq 'Reading']/name/string() }
lacking: city     : { //driver[not(city/string())]/name/string() }
only: name      : { //driver[empty((* except name)[string()])]/name/string() }
</_>)

===============================

Summary:

(1) The cost:

8 + 6 lines of XQuery code

(2) The benefits:

(a) the desired report

(b) a generic query to turn CSV into XML (supporting customized element names)

(3) The principle: use XML

Hans-Juergen

PS: A generic query (XQuery 3.0) for turning CSV into XML, supporting customized element names:

===============================

declare variable $uri external;
declare variable $names external;
declare variable $enames := tokenize(normalize-space($names), ' ');
element {$enames[1]} {
    for $line in unparsed-text-lines($uri)[string()] return
        element {$enames[2]} {
            for $cell at $nr in tokenize($line, ';') return
                element {$enames[2 + $nr]} {$cell}}}

===============================

PPS: Using BaseX (download from: http://basex.org/products/download/all-downloads/ ), the following command:

basex -b uri=drivers.csv "-b names=drivers driver name age city" csv2xml.xq

created this XML:

<drivers>
<driver>
    <name>michael</name>
    <age>62</age>
    <city>Reading</city>
</driver>
<driver>
    <name>bill</name>
    <age/>
    <city/>
</driver>
<driver>
    <name>andy</name>
    <age>32</age>
    <city>London</city>
</driver>
<driver>
    <name>tom</name>
    <age>28</age>
    <city>Reading</city>
</driver>
<driver>
    <name>chris</name>
    <age/>
    <city>Reading</city>
</driver>
<driver>
    <name>paul</name>
    <age>41</age>
    <city/>
</driver>
</drivers>