graphical user interface (GUI) for XML stream processing

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Erik Sj�lund <erik.sjolund@gmail.com>
To: xml-dev@lists.xml.org
Date: Sat, 4 Jun 2011 10:19:26 +0200

Hi,
I would like to present an idea I have for a graphical user interface
(GUI) for simple cases of XML stream processing. I have started
writing the program but it isn't finished yet. As I am considering if
it could be commercialized somehow it is unfortunately not available
for download at the moment.

The typical user of the program would be someone that has a need for
repeatedly performing a split operation on a very big repetitive XML
file (possibly larger than the available RAM memory).

If we for instance want to split a very big XML file (persons.xml)

<worldpopulation>
  <person firstname="John" lastname="Smith" age="60"/>
 <person firstname="Jane" lastname="Brown" age="58"/>
</worldpopulation>

into separate files with one file per person

$ cat /tmp/Smith-60.txt
<person firstname="John" lastname="Smith" age="60"/>
$ cat /tmp/Brown-58.txt
<person firstname="Jane" lastname="Brown" age="58"/>

we would in the GUI create an XML file (mostly by mouse-clicking)

<element name="wordpopulation" namespace="">
 <element name="person" namespace="">
  <attribute name="age" namespace="">
    <intVariable variableName="ageVar"/>
  </attribute>
  <attribute name="lastname" namespace="">
    <stringVariable variableName="lastnameVar"/>
  </attribute>
  <saveFileSplit>
     <filepath>
        <text>/tmp/</text>
        <replaceVariableValue variableRef="lastnameVar"/>
        <text>-</text>
        <replaceVariableValue variableRef="ageVar"/>
        <text>.txt</text>
     </filepath>
  </saveFileSplit>
 </element>
</element>

The file defines the stream operations that should be applied to the
file persons.xml.

In addition to the savefile split operation I have also implemented
these split operations:

* runcommand (a user specified command is run one time per split with the
split written as stdin to the command)
* libxslt (executing an XSLT script on each split)
* tokyocabinet (storing the splits in btree key-value database)
* printvariables
* flot (http://code.google.com/p/flot/)

The last two of them do not use the sub tree of the split but just
user-defined variables.

The tokyocabinet split operation is right now just a proof of concept.
You can by clicking in the GUI create a btree compare function that is
built up from a list of user-defined variables, for instance

<btreecomparefunction>
 <variable name="lastnameVar" gtOrLt="greater than"/>
 <variable name="firstnameVar" gtOrLt="greater than"/>
</btreecomparefunction>

The tokyocabinet split operation stores the splits into a btree Tokyo
Cabinet and after finishing, it prints out the splits sorted. I have a
future plan to create zorba-xquery, xqilla and libxslt external
functions for retrieving values out of the Tokyo Cabinet btree
database.

This was a very sketchy overview of the program but by looking at some
screenshots[1] you can get a better understanding of how the program
works.

What do you think about this program? Could it be useful?
If you have any ideas regarding commercialization of the program
contact me in a private email.
cheers,
Erik Sj�lund

[1] http://www.adivo.se/screenshots-2011-06-04.zip
(md5sum: 86ad29d405ebddf3920b4a9c64e0d8e0)

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]