[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
analysis of temporal XML data -- research ideas
- From: "Johannes.Lichtenberger" <Johannes.Lichtenberger@uni-konstanz.de>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Sat, 09 Apr 2011 13:15:45 +0200
Hello.
since I'm currently not sure in which direction my master thesis will go
I have some research ideas for analysing temporal XML data.
What I've done so far is
- written an XMLUpdateShredder, which doesn't find the minimal cost edit
distance between two revisions, to shredder Updates into our revisioned
database
- sorted Wikipedia (with edit history) via Hadoop and written an
Importer (not tested on real wikipedia yet)
- finding diffs between two revisions in the database itself (since it's
based on the internal node encoding it finds the minimum edit cost)
- developed a GUI with a simple text view, tree view and a sunburst view
which enables explorative visualization and modification abilities and
also can visualize changes between two revisions (might be extended to
more than two revisions, but seems to be very complex)
Since some XML files are rather huge it would be great (besides pruning,
which can be enabled and is implemented to prune nodes after some depth
is reached) to have some kind of overview and maybe cluster data via a
text clustering algorithm to find similar topics for example. This could
be done through a force directed visualization between two revisions.
- first select nodes via an XPath expression
- cluster nodes through some text cluster algorithm (which can be
combined in further work with some structur analysis)
- through a force based layout the selected nodes get clustered and the
color of the nodes encode if they are the same/inserted/deleted/updated
- clicking on a node or maybe even hovering is getting synchronized with
the SunburstView to provide a detailed hierarchical overview of the
changes in this subtree
Another direction would be to develop an index structure to track
changes via XPath extensions (through time axis) which has been done by
some ETHlers in TimeMachine and further research in this area.
A third alternative would be to develop further visualizations for which
I also have some ideas.
I'm currently not sure in which direction to go and what would be of
some value for "real world" applications. I'm open for any suggestions.
regards,
Johannes
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]