   XML waves (was : Divorcing Data Model and Syntax)

[Jeff Lowery]
>Now, if we could just represent an XML document as a wave...

For analysis purposes, that could be pretty trivial - if you view the
characters in text nodes as each representing a unit time delay, and the
depth of nesting as the amplitude. Not sure how you'd handle attribute
data - perhaps in another dimension. What could you do with it? Well, a lot
of the smarter search/index clustering techniques are effectively
signal-based. Doing a (discrete) Fourier transform of a XML wave would
provide a kind of signature, which would be shared by documents that are
structurally similar. There are loads of filtering and cross-correlation
techniques available once you have your wave.

I doubt whether the inverse transform would be all that useful, but : Years
back I had an incredibly dull electronics lecturer. The only time I was
really awake in front of him was when he covered Laplace transforms. He
showed a kilometre of formulae describing the relationship between the input
& output of a circuit. Then he demonstrated the Laplace technique - did a
transform, then a sum, then the inverse transform - exactly the same result.
Gobsmackingly elegant. Perhaps the XML wave + Fourier view has similar

The signal analysis approach to documents is wide open, but in many/most
minds it's orthogonal to the logical approach to that found in e.g. RDF.
But if you look at how productive simple numeric algorithms like those used
by Google can be for automatically discovering semantic relations (however
weak in Google's case) then IMHO it becomes clear that Semantic Web research
should be delving into these areas. Word/character analysis can be very
useful, and statistical/neural techniques like Bayesian analysis and
Kohonen's SOM but these are usually applied at a low level (in respect of
structure) to documents. Google takes into account the linking, but this
information is still (apparently) handled at quite a trivial level.
So a wave view of XML is a great idea!



